lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83b47424-e5e0-46de-aa63-d413a5aa6cec@paulmck-laptop>
Date: Wed, 6 Mar 2024 18:29:43 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, linke li <lilinke99@...com>,
	joel@...lfernandes.org, boqun.feng@...il.com, dave@...olabs.net,
	frederic@...nel.org, jiangshanlai@...il.com, josh@...htriplett.org,
	linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
	qiang.zhang1211@...il.com, quic_neeraju@...cinc.com,
	rcu@...r.kernel.org
Subject: Re: [PATCH] rcutorture: Fix
 rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency
 bug

On Wed, Mar 06, 2024 at 11:46:10AM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> > Note this has nothing to do with tracing. This thread is in RCU. I just
> > happen to receive the same patch "fix" for my code.
> 
> Ok, googling for rtort_pipe_count, I can only state that that code is
> complete garbage.

Well, you all do have to admit that I was right about something, namely
that Linus did not suffer in silence.  ;-)

TL;DR:  Those ->rtort_pipe_count increments cannot run concurrently
with each other or any other update of that field, so that update-side
READ_ONCE() call is unnecessary and the update-side plain C-language
read is OK.  The WRITE_ONCE() calls are there for the benefit of the
lockless read-side accesses to rtort_pipe_count.

Of course, I will fix.

> And no amount of READ_ONCE/WRITE_ONCE will fix it.
> 
> For one thing, we have this code:
> 
>         WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>         if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
> 
> which is broken by design. The compiler is allowed to (and probably
> does) turn that into just
> 
>         WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>         if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
> 
> which only results in the question "Why didn't the source code do that
> obvious simplification itself?"

Oddly enough, we got a patch to this effect just yesterday evening,
Pacific Time:

https://lore.kernel.org/all/tencent_5D8919B7D1F21906868DE81406015360270A@qq.com/

But that of course only fixes one aspect of this mess.

> So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
> regular reads in ways that just makes me go: "there's no saving this
> shit".
> 
> This needs fixing. Having tests that have random code in them only
> makes me doubt that the *TEST* itself is correct, rather than the code
> it is trying to actually test.

Huh.  There should only be one CPU updating ->rtort_pipe_count at at time.
Which to your point would make that READ_ONCE() call not only unnecessary,
but confusing as well.  Here is the intended uses of ->rtort_pipe_count:

1.	Zeroed when the item is first allocated.

2.	Incremented at the end of each subsequent grace period,
	so there is no concurrency with previous instances of
	this increment and also none with the zeroing.

	This happens in two different ways.  When testing things like
	call_rcu(), the RCU callback does the update, and if we have not
	yet reached the limit, passes that same callback to call_rcu().
	When testing other grace-period primitives (synchronize_rcu(),
	for example), we put the item on a list, and update each element
	of that list after each subsequent synchronous grace period.
	Once an element has pased through RCU_TORTURE_PIPE_LEN grace
	periods, it is removed from that list and freed.

	Either way, there is only ever one CPU incrementing a given
	->rtort_pipe_count at any given time.

3.	Freed, and not passed to another grace period, thus avoiding
	concurrency with a future #2.  Allocation and free is protected by
	a spinlock, so there is no concurrency with #1 above.  The same
	CPU that did the last increment does the free, so there is also
	no concurrency with the last #2.

I fired up a KCSAN run with added ASSERT_EXCLUSIVE_WRITER() calls, which
agrees with this analysis.  (And I will run longer runs to double-check.)

> And dammit, none of that makes sense anyway. This is not some
> performance-crticial code. Why is it not using proper atomics if there
> is an actual data race?
> 
> The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
> lot faster than atomics, or - more commonly - because you have some
> fundamental algorithm that doesn't do arithmetic, but cares about some
> "state at time X" (the RCU _pointer_ being one such obvious case, but
> doing an *increment* sure as hell isn't).

The only data race is between the one-at-a-time increment and the
lockless reads in the RCU readers.  So the RCU readers need READ_ONCE()
for ->rtort_pipe_count, but again the updater does not.

Which means that the compiler optimization that Linus pointed out above
really is an optimization for once because the compiler is for once
correct that nothing else is updating ->rtort_pipe_count.

> So using those READ_ONCE/WRITE_ONCE macros for that thing is
> fundamntally wrong to begin with.
> 
> The question should not be "should we add another READ_ONCE()". The
> question should be "what drugs were people on when writing this code"?

So what (if anything) was I thinking when I committed those update-side
READ_ONCE() calls?

202489101f2e6c ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")

The commit log says the right thing, but I nevertheless added that
unnecessary READ_ONCE() call.  And here I was hoping to be able to blame
someone else!  ;-)

> People - please just stop writing garbage.
> 
> That 'rtort_pipe_count' should be an atomic_t, and the "add one and
> return the old value" should be an "atomic_inc_return()-1" (the "-1"
> is because "inc_return" returns the *new* value).
> 
> And feel free to add "_relaxed()" to that atomic op because this code
> doesn't care about ordering of that counter. It will help on some
> architectures, but as mentioned, this is not performance-crticial code
> to begin with.

There is no update-side concurrency, so there is no need for atomics.
I just need to get rid of that extraneous update-side READ_ONCE() call.

Athough I am not sure that I can credibly promise to never ever write
garbage, I most certainly can fix this particular instance.  :-/

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ