[cpp-threads] modes, pass 2
Alexander Terekhov
alexander.terekhov at gmail.com
Thu May 12 11:48:59 BST 2005
Forgot one thing...
On 5/12/05, Alexander Terekhov <alexander.terekhov at gmail.com> wrote:
> [... JSR-133 cookbook and isync ...]
>
> > some people assume you already have a control dependency
> > and some don't. I changed this accordingly on cookbook page.
>
> Uhmm. FWIW I (still) don't like your use of Sparc's bidrectional
> fences to illustrate Java's RCsc. Even with control dependency,
> isync isn't quite LoadLoad (in the sense of a bidirectional fence
> that prohibits reordering of *all* preceding loads with respect
> to subsequent ones).
>
> http://www.cs.umd.edu/~pugh/java/memoryModel/archive/1222.html
> http://www.decadentplace.org.uk/pipermail/cpp-threads_decadentplace.org.uk/2005-May/000445.html
>
> Why don't you switch to hoist/sink stuff labels? Ok, perhaps with
> "msync constraint(allow_reordering pre, allow_reordering post)"
> calculator, so to speak. ;-)
You say in cookbook:
----
For descriptions of the underlying models supported on different processors,
see Sarita Adve et al, Recent Advances in Memory Consistency Models for
Hardware Shared-Memory Systems and Sarita Adve and Kourosh
Gharachorloo, Shared Memory Consistency Models: A Tutorial.
----
I think that
http://tinyurl.com/dzc8l
(Kourosh Gharachorloo, Anoop Gupta, and John Hennessy. Two
techniques to enhance the performance of memory consistency
models.)
is also quite illuminative: "acq, done, and store tag" fields, etc.
<quote>
The speculative-load buffer provides the detection mechanism by
signaling when the speculated result is incorrect. The buffer
works as follows. Loads that are retired from the reservation
station are put into the buffer in addition to being issued to
the memory system. There are four fields per entry (as shown in
Figure 4): load address, acq, done, and store tag. The load
address field holds the physical address for the load. The acq
field is set if the load is considered an acquire access. For
SC, all loads are treated as acquires. The done field is set
when the load is performed. If the consistency constraints
require the load to be delayed for a previous store, the store
tag uniquely identifies that store. A null store tag specifies
that the load depends on no previous stores. When a store
completes, its corresponding tag in the speculative-load buffer
is nullified if present. Entries are retired in a FIFO manner.
Two conditions need to be satisfied before an entry at the head
of the buffer is retired. First, the store tag field should
equal null. Second, the done field should be set if the acq
field is set. Therefore, for SC, an entry remains in the buffer
until all previous load and store accesses complete and the
load access it refers to completes. Appendix A describes how an
atomic read-modify-write can be incorporated in the above
implementation.
We now describe the detection mechanism. The following
coherence transactions are monitored by the speculativeload
buffer: invalidations (or ownership requests), updates, and
replacements.3 The load addresses in the buffer are
associatively checked for a match with the address of such
transactions.4 Multiple matches are possible. We assume the
match closest to the head of the buffer is reported. A match
in the buffer for an address that is being invalidated or
updated signals the possibilityof an incorrect speculation. A
match for an address that is being replaced signifies that
future coherence transactions for that address will not be
sent to the processor. In either case, the speculated value
for the load is assumed to be incorrect. Guaranteeing the
constraints for release consistency can be done in a similar
way to SC. The conventional way to provide RC is to delay a
release access until its previous accesses complete and to
delay accesses following an acquire until the acquire
completes. Let us first consider delays for stores.
The mechanism that provides precise interrupts by holding
back store accesses in the store buffer is sufficient for
guaranteeing that stores are delayed for the previous
acquire. Although the mechanism described is stricter than
what RC requires, the conservative implementation is required
for providing precise interrupts. The same mechanism also
guarantees that a release (which is simply a special store
access) is delayed for previous load accesses. To guarantee a
release is also delayed for previous store accesses, the store
buffer delays the issue of the release operation until all
previously issued stores are complete. In contrast to SC,
however, ordinary stores are issued in a pipelined manner.
</quote>
regards,
alexander.
More information about the cpp-threads
mailing list