lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190821103200.kpufwtviqhpbuv2n@willie-the-truck>
Date:   Wed, 21 Aug 2019 11:32:01 +0100
From:   Will Deacon <will@...nel.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joel Fernandes <joel@...lfernandes.org>,
        Alan Stern <stern@...land.harvard.edu>,
        rostedt <rostedt@...dmis.org>,
        Valentin Schneider <valentin.schneider@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Will Deacon <will.deacon@....com>,
        David Howells <dhowells@...hat.com>
Subject: Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates

On Tue, Aug 20, 2019 at 01:29:32PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 20, 2019 at 03:56:12PM +0200, Peter Zijlstra wrote:
> > On Sat, Aug 17, 2019 at 01:08:02AM -0700, Linus Torvalds wrote:
> > 
> > > The data tearing issue is almost a non-issue. We're not going to add
> > > WRITE_ONCE() to these kinds of places for no good reason.
> > 
> > Paulmck actually has an example of that somewhere; ISTR that particular
> > case actually got fixed by GCC, but I'd really _love_ for some compiler
> > people (both GCC and LLVM) to state that their respective compilers will
> > not do load/store tearing for machine word sized load/stores.
> 
> I do very much recall such an example, but I am now unable to either
> find it or reproduce it.  :-/
> 
> If I cannot turn it up in a few days, I will ask the LWN editors to
> make appropriate changes to the "Who is afraid" article.
> 
> > Without this written guarantee (which supposedly was in older GCC
> > manuals but has since gone missing), I'm loathe to rely on it.
> > 
> > Yes, it is very rare, but it is a massive royal pain to debug if/when it
> > does do happen.
> 
> But from what I can see, Linus is OK with use of WRITE_ONCE() for data
> races on any variable for which there is at least one READ_ONCE().
> So we can still use WRITE_ONCE() as we would like in our own code.
> Yes, you or I might be hit by someone else's omission of WRITE_ONCE(),
> it is better than the proverbial kick in the teeth.
> 
> Of course, if anyone knows of a compiler/architecture combination that
> really does tear stores of 32-bit constants, please do not keep it
> a secret!  After all, it would be good to get that addressed easily
> starting now rather than after a difficult and painful series of
> debugging sessions.

It's not quite what you asked for, but if you look at the following
silly code:

typedef unsigned long long u64;

struct data {
	u64 arr[1023];
	u64 flag;
};

void foo(struct data *x)
{
	int i;

	for (i = 0; i < 1023; ++i)
		x->arr[i] = 0;

	x->flag = 0;
}

void bar(u64 *x)
{
	*x = 0xabcdef10abcdef10;
}

Then arm64 clang (-O2) generates the following for foo:

foo:                                    // @foo
	stp	x29, x30, [sp, #-16]!   // 16-byte Folded Spill
	orr	w2, wzr, #0x2000
	mov	w1, wzr
	mov	x29, sp
	bl	memset
	ldp	x29, x30, [sp], #16     // 16-byte Folded Reload
	ret

and so the store to 'flag' has become part of the memset, which could
easily be bytewise in terms of atomicity (and this isn't unlikely given
we have a DC ZVA instruction which only guaratees bytewise atomicity).

GCC (also -O2) generates the following for bar:

bar:
	mov	w1, 61200
	movk	w1, 0xabcd, lsl 16
	stp	w1, w1, [x0]
	ret

and so it is using a store-pair instruction to reduce the complexity in
the immediate generation. Thus, the 64-bit store will only have 32-bit
atomicity. In fact, this is scary because if I change bar to:

void bar(u64 *x)
{
	*(volatile u64 *)x = 0xabcdef10abcdef10;
}

then I get:

bar:
	mov	w1, 61200
	movk	w1, 0xabcd, lsl 16
	str	w1, [x0]
	str	w1, [x0, 4]
	ret

so I'm not sure that WRITE_ONCE would even help :/

It's worth noting that:

void baz(atomic_long *x)
{
	atomic_store_explicit(x, 0xabcdef10abcdef10, memory_order_relaxed)
}

does the right thing:

baz:
	mov	x1, 61200
	movk	x1, 0xabcd, lsl 16
	movk	x1, 0xef10, lsl 32
	movk	x1, 0xabcd, lsl 48
	str	x1, [x0]
	ret

Whilst these examples may be contrived, I do thing they illustrate that
we can't simply say "stores to aligned, word-sized pointers are atomic".

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ