[cpp-threads] modes

Sat May 7 17:59:23 BST 2005

Doug Lea wrote:

>> The first problem is that it's missing unconstrained atomics.
>
> Sorry for not being clear. I meant that ordinary_read and
> ordinary_write (or some better names) would be in the atomic classes.
> Although even here we'd need to decide whether these
> reads/writes must be bitwise-atomic. I think we once agrred they
> should be.

But once we have these, we can no longer claim sequential consistency.

>> The second problem is that finer-grained constraints can eliminate
>> some of the barriers.
>
> I understand that. Can you tell me of a compelling use case where you
> can show that it matters enough to be worthwhile supporting,
> especially given the potential need to go from describing N^2 to
> (N+1)^2 possible interactions across modes?

I think that this particular ball is in Alexander's court... <whistles 
innocently>

(Seriously... I have zero practical experience with multiprocessor PowerPCs, 
Alphas, SPARC RMOs, or Itaniums, and no access to any of these, either. So 
while I can offer theoretical examples, I can't back them up with numbers.)

>>> 2. There's very little motivation to define CAS-with-acquire only.
>>
>> Isn't CAS with acquire on success the typical fast mutex lock path
>> (and the entire trylock implementation, for that matter)?
>
> I do agree that this one is a close call. But to the best of my
> knowledge, it is only faster when used for one style of implementing
> locks on one known platform.

SPARC RMO and Alpha would still gain by eliding the leading #LoadLoad | 
#StoreStore / mb, I believe.

> Of course, the broader point here is where do you draw the line?
> I didn't write my list insisting that these should be exactly what we
> end up with, but only thinking that they were the ones we didn't need
> to debate.

The alternative, which I've been pursuing based on Alexander's 
unidirectional constraint set, is to let implementations draw the line in 
whatever manner they see fit.

The basic idea is that we have the following primitive constraints:

- hoist-load barrier
- hoist-store barrier
- sink-load barrier
- sink-store barrier

that can be combined freely giving 2^4 combinations (cc* and dd* aside for 
now.)

An operation with constraint C1 can be replaced by the implementation with 
the same operation with a constraint C2, provided that C2 is at least as 
strong as C1, i.e. (C2 & C1) == C1.

So let's say that you ask for an atomic load with hoist-load barrier, but 
the current platform only has:

load with acquire (hoist-load + hoist-store) { mov on x86 }
load with full fence (all four) { lock xadd with zero on x86 }

The constraint that is closest to hoist-load is acquire, so the 
implementation would pick that.

A PowerPC platform can in principle offer eight or even twelve different 
load variations, but the implementation is unlikely to put much effort in 
optimizing the various load-with-release cases (and is free not to.)