linux-kernel - Re: [PATCH v2] make buffer_locked provide an acquire semantics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220801154108.GA26280@willie-the-truck>
Date:   Mon, 1 Aug 2022 16:41:09 +0100
From:   Will Deacon <will@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mikulas Patocka <mpatocka@...hat.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Akira Yokosawa <akiyks@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v2] make buffer_locked provide an acquire semantics

Hi Linus, Paul,

Apologies for the slow response here; believe it or not, I was attending
a workshop about memory ordering.

On Sun, Jul 31, 2022 at 10:30:11AM -0700, Paul E. McKenney wrote:
> On Sun, Jul 31, 2022 at 09:51:47AM -0700, Linus Torvalds wrote:
> > Even alpha is specified to be locally ordered wrt *one* memory
> > location, including for reads (See table 5-1: "Processor issue order",
> > and also 5.6.2.2: "Litmus test 2"). So if a previous read has seen a
> > new value, a subsequent read is not allowed to see an older one - even
> > without a memory barrier.
> > 
> > Will, Paul? Maybe that's only for overlapping loads/stores, not for
> > loads/loads. Because maybe alpha for once isn't the weakest possible
> > ordering.
> 
> The "bad boy" in this case is Itanium, which can do some VLIW reordering
> of accesses.  Or could, I am not sure that newer Itanium hardware
> does this.  But this is why Itanium compilers made volatile loads use
> the ld,acq instruction.
> 
> Which means that aligned same-sized marked accesses to a single location
> really do execute consistently with some global ordering, even on Itanium.

Although this is true, there's a really subtle issue which crops up if you
try to compose this read-after-read ordering with dependencies in the case
where the two reads read the same value (which is encapsulated by the
unusual RSW litmus test that I've tried to convert to C below):

/* Global definitions; assume everything zero-initialised */
struct foo {
	int *x;
};

int x;
struct foo foo;
struct foo *ptr;

/* CPU 0 */
WRITE_ONCE(x, 1);
WRITE_ONCE(foo.x, &x);

/*
 * Release ordering to ensure that somebody following a non-NULL ptr will
 * see a fully-initialised 'foo'. smp_[w]mb() would work as well.
 */
smp_store_release(&ptr, &foo);

/* CPU 1 */
int *xp1, *xp2, val;
struct foo *foop;

/* Load the global pointer and check that it's not NULL. */
foop = READ_ONCE(ptr);
if (!foop)
	return;

/*
 * Load 'foo.x' via the pointer we just loaded. This is ordered after the
 * previous READ_ONCE() because of the address dependency.
 */
xp1 = READ_ONCE(foop->x);

/*
 * Load 'foo.x' directly via the global 'foo'.
 * _This is loading the same address as the previous READ_ONCE() and
 *  therefore cannot return a stale (NULL) value!_
 */
xp2 = READ_ONCE(foo.x);

/*
 * Load 'x' via the pointer we just loaded.
 * _We may see zero here!_
 */
val = READ_ONCE(*xp2);

So in this case, the two same-location reads on CPU1 are actually executed
out of order, but you can't tell just by looking at them in isolation
because they returned the same value (i.e. xp1 == xp2 == &x). However, you
*can* tell when you compose them in funny situations such as above (and I
believe that this is demonstrably true on PPC and Arm; effectively the
read-after-read ordering machinery only triggers in response to incoming
snoops).

There's probably a more-compelling variant using an (RCpc) load acquire
instead of the last address dependency on CPU 1 as well.

Anyway, I think all I'm trying to say is that I'd tend to shy away from
relying on read-after-read ordering if it forms part of a more involved
ordering relationship.

> > But the patch looks fine, though I agree that the ordering in
> > __wait_on_buffer should probably be moved into
> > wait_on_bit/wait_on_bit_io.
> > 
> > And would we perhaps want the bitops to have the different ordering
> > versions? Like "set_bit_release()" and "test_bit_acquire()"? That
> > would seem to be (a) cleaner and (b) possibly generate better code for
> > architectures where that makes a difference?
> 
> As always, I defer to the architecture maintainers on this one.

FWIW, that makes sense to me. We already have release/acquire/releaxed
variants of the bitops in the atomic_t API so it seems natural to have
a counterpart in the actual bitops API as well.

Will