[cpp-threads] Spurious failures of try_lock{_for}({rel_time}) vs. strong try_lock{_for}({rel_time})

Sat Dec 27 16:05:54 GMT 2008

On Wed, Dec 24, 2008 at 7:17 AM, Hans Boehm <Hans.Boehm at hp.com> wrote:
>
> On Tue, 23 Dec 2008, Alexander Terekhov wrote:
>
>> On Tue, Dec 23, 2008 at 7:44 AM, Hans Boehm <Hans.Boehm at hp.com> wrote:
>>>
>>> On Mon, 22 Dec 2008, Alexander Terekhov wrote:
>>>
>>>> N2800's try_lock():
>>>>
>>>> "Effects: Attempts to obtain ownership of the mutex for the calling
>>>> thread without blocking. If ownership is not obtained, there is no
>>>> effect and try_lock() immediately returns. An implementation may fail
>>>> to obtain the lock even if it is not held by any other thread."
>>>>
>>>> N2800's try_lock_for(rel_time):
>>>>
>>>> "Effects: The function attempts to obtain ownership of the mutex
>>>> within the time specified by rel_time. If the time specified by
>>>> rel_time is less than or equal to 0, the function attempts to obtain
>>>> ownership without blocking (as if by calling try_lock())."
>>>>
>>>> seem to contradict POSIX:
>>>>
>>>> http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_trylock.html
>>>>
>>>> "The pthread_mutex_trylock() function shall be equivalent to
>>>> pthread_mutex_lock(), except that if the mutex object referenced by
>>>> mutex is currently locked (by any thread, including the current
>>>> thread), the call shall return immediately."
>>>>
>>>> http://www.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_timedlock.html
>>>>
>>>> "Under no circumstance shall the function fail with a timeout if the
>>>> mutex can be locked immediately."
>>>
>>> I would argue that this is a mistake in the Posix standard.  Current
>>
>> I disagree. Current POSIX XBD 4.10 Memory Synchronization states that
>>
>> http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_10
>>
>> "Unless explicitly stated otherwise, if one of the above functions
>> returns an error, it is unspecified whether the invocation causes
>> memory to be synchronized."
>>
>> So there isn't any release-acquire pairing involving failed
>> try{timed}lock() in POSIX. See my interpretation of XBD 4.10 in terms
>> of release-acquire pairings:
>>
>> http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-April/000222.html
>
> I think your interpretation is quite different from my reading of
> the current Posix document.

What is your reading of the current POSIX document in this respect?

> AFAIK, the current document is still
> expressed in terms of "synchronizing memory".

Yes, but use of that undefined term (making little sense on its face
as well) doesn't mean that the intent (see POSIX XBD 4.10 Rationale
quoted below) was to define a model any stronger than release-acquire
pairings. You own survey proved that implementor's reading of POSIX is
not such.

> The fact that trylock
> may not "synchronize memory" doesn't matter, since it can be followed
> by a lock acquisition and release (as it was in my example), that does
> "synchronize memory".

But not in the sense of resulting in a data-race-free program. So it
doesn't matter.

> If you reexpress Posix in terms of release-acquire
> pairing, you end up with a different, more Java-like model.

Consider (from POSIX Rationale):

http://www.opengroup.org/onlinepubs/009695399/xrat/xbd_chap04.html#tag_01_04_10

------
A.4.10 Memory Synchronization

In older multi-processors, access to memory by the processors was
strictly multiplexed. This meant that a processor executing program
code interrogates or modifies memory in the order specified by the
code and that all the memory operation of all the processors in the
system appear to happen in some global order, though the operation
histories of different processors are interleaved arbitrarily. The
memory operations of such machines are said to be sequentially
consistent. In this environment, threads can synchronize using
ordinary memory operations. For example, a producer thread and a
consumer thread can synchronize access to a circular data buffer as
follows:

int rdptr = 0;
int wrptr = 0;
data_t buf[BUFSIZE];

Thread 1:
    while (work_to_do) {
        int next;

        buf[wrptr] = produce();
        next = (wrptr + 1) % BUFSIZE;
        while (rdptr == next)
            ;
        wrptr = next;
}

Thread 2:
    while (work_to_do) {
        while (rdptr == wrptr)
            ;
        consume(buf[rdptr]);
        rdptr = (rdptr + 1) % BUFSIZE;
    }

In modern multi-processors, these conditions are relaxed to achieve
greater performance. If one processor stores values in location A and
then location B, then other processors loading data from location B
and then location A may see the new value of B but the old value of A.
The memory operations of such machines are said to be weakly ordered.
On these machines, the circular buffer technique shown in the example
will fail because the consumer may see the new value of wrptr but the
old value of the data in the buffer. In such machines, synchronization
can only be achieved through the use of special instructions that
enforce an order on memory operations. Most high-level language
compilers only generate ordinary memory operations to take advantage
of the increased performance. They usually cannot determine when
memory operation order is important and generate the special ordering
instructions. Instead, they rely on the programmer to use
synchronization primitives correctly to ensure that modifications to a
location in memory are ordered with respect to modifications and/or
access to the same location in other threads. Access to read-only data
need not be synchronized. The resulting program is said to be data
race-free.

Synchronization is still important even when accessing a single
primitive variable (for example, an integer). On machines where the
integer may not be aligned to the bus data width or be larger than the
data width, a single memory load may require multiple memory cycles.
This means that it may be possible for some parts of the integer to
have an old value while other parts have a newer value. On some
processor architectures this cannot happen, but portable programs
cannot rely on this.

In summary, a portable multi-threaded program, or a multi-process
program that shares writable memory between processes, has to use the
synchronization primitives to synchronize data access. It cannot rely
on modifications to memory being observed by other threads in the
order written in the application or even on modification of a single
variable being seen atomically.

Conforming applications may only use the functions listed to
synchronize threads of control with respect to memory access. There
are many other candidates for functions that might also be used.
Examples are: signal sending and reception, or pipe writing and
reading. In general, any function that allows one thread of control to
wait for an action caused by another thread of control is a candidate.
IEEE Std 1003.1-2001 does not require these additional functions to
synchronize memory access since this would imply the following:

All these functions would have to be recognized by advanced
compilation systems so that memory operations and calls to these
functions are not reordered by optimization.

All these functions would potentially have to have memory
synchronization instructions added, depending on the particular
machine.

The additional functions complicate the model of how memory is
synchronized and make automatic data race detection techniques
impractical.

Formal definitions of the memory model were rejected as unreadable by
the vast majority of programmers. In addition, most of the formal work
in the literature has concentrated on the memory as provided by the
hardware as opposed to the application programmer through the compiler
and runtime system. It was believed that a simple statement intuitive
to most programmers would be most effective. IEEE Std 1003.1-2001
defines functions that can be used to synchronize access to memory,
but it leaves open exactly how one relates those functions to the
semantics of each function as specified elsewhere in IEEE Std
1003.1-2001. IEEE Std 1003.1-2001 also does not make a formal
specification of the partial ordering in time that the functions can
impose, as that is implied in the description of the semantics of each
function. It simply states that the programmer has to ensure that
modifications do not occur "simultaneously" with other access to a
memory location.
-----