Now this one was a trick question! ;–)
This trick question involved the following code fragment, where each
function foo_n()
runs on CPU n:
int x, y; /* shared variables */ int r1, r2, r3; /* private variables */ void foo_0(void) { ACCESS_ONCE(x) = 1; } void foo_1(void) { r1 = x; smp_rmb(); /* The only change. */ r2 = y; } void foo_2(void) { y = 1; smp_mb(); r3 = x; }
Suppose that the following assertion runs after all of the preceding functions complete. Can this assertion ever trigger?
assert(!(r1 == 1 && r2 == 0 && r3 == 0));
Once again, when it comes to memory barriers, a little intuition can be a very dangerous thing. Nevertheless, let's once again at least see where it leads, which is pretty much the same place as before.
Let's assume that the assertion can trigger.
This means that r1 == 1
, which means that foo_0()
must have executed before foo_1()
did.
For the assertion to trigger, we must also have r2 == 0
,
which means that foo_1()
must have executed before
foo_2()
did.
But if foo_0()
executed before foo_1()
and
foo_1()
executed before foo_2()
, then
foo_2()
's load from x
must assuredly see
foo_0()
's store to x
.
This in turn means that r3 == 1
, preventing the assertion
from triggering.
So, once again, most people's intuitions would be much happier if the assertion could never trigger.
Digging through the Linux kernel documentation gives us the same result as before, namely that there is no guarantee.
But the truly insane will read the actual code along with all of
the relevant hardware documentation.
Because ARM does not have a weak barrier that preserves the order of
reads, ARM defines smp_rmb()
to be
the same as smp_mb()
, which means that ARM gives
the same result as before.
(ARM does have a weaker memory barrier that orders stores, but this
is not used for smp_wmb()
as of the 2.6.34 Linux kernel.)
Power uses the lwsync
instruction for smp_rmb()
which orders prior loads against subsequent loads and stores and also
prior stores against subsequent stores.
However, foo_0()
's store to x
and
foo_2()
's read from x
constitute a store followed
by a load, for which the lwsync
instruction does not
guarantee ordering.
The assertion can therefore trigger, and in does trigger on real hardware.
So if you want your memory accesses to act in a transitive fashion,
use smp_mb()
rather than either smp_rmb()
or
smp_wmb()
.