linux-kernel - Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgPAZ4KnCQergqAOUypwinYh=gZ0q4EQbwvuUcJ_8UK+Q@mail.gmail.com>
Date: Wed, 6 Mar 2024 11:46:10 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, linke li <lilinke99@...com>, joel@...lfernandes.org, 
	boqun.feng@...il.com, dave@...olabs.net, frederic@...nel.org, 
	jiangshanlai@...il.com, josh@...htriplett.org, linux-kernel@...r.kernel.org, 
	mathieu.desnoyers@...icios.com, qiang.zhang1211@...il.com, 
	quic_neeraju@...cinc.com, rcu@...r.kernel.org
Subject: Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture_writer()
 data race and concurrency bug

On Wed, 6 Mar 2024 at 11:27, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> Note this has nothing to do with tracing. This thread is in RCU. I just
> happen to receive the same patch "fix" for my code.

Ok, googling for rtort_pipe_count, I can only state that that code is
complete garbage.

And no amount of READ_ONCE/WRITE_ONCE will fix it.

For one thing, we have this code:

        WRITE_ONCE(rp->rtort_pipe_count, i + 1);
        if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {

which is broken by design. The compiler is allowed to (and probably
does) turn that into just

        WRITE_ONCE(rp->rtort_pipe_count, i + 1);
        if (i + 1 >= RCU_TORTURE_PIPE_LEN) {

which only results in the question "Why didn't the source code do that
obvious simplification itself?"

So that code is actively *STUPID*. It's randomly mixing READ_ONCE and
regular reads in ways that just makes me go: "there's no saving this
shit".

This needs fixing. Having tests that have random code in them only
makes me doubt that the *TEST* itself is correct, rather than the code
it is trying to actually test.

And dammit, none of that makes sense anyway. This is not some
performance-crticial code. Why is it not using proper atomics if there
is an actual data race?

The reason to use READ_ONCE() and WRITE_ONCE() is that they can be a
lot faster than atomics, or - more commonly - because you have some
fundamental algorithm that doesn't do arithmetic, but cares about some
"state at time X" (the RCU _pointer_ being one such obvious case, but
doing an *increment* sure as hell isn't).

So using those READ_ONCE/WRITE_ONCE macros for that thing is
fundamntally wrong to begin with.

The question should not be "should we add another READ_ONCE()". The
question should be "what drugs were people on when writing this code"?

People - please just stop writing garbage.

That 'rtort_pipe_count' should be an atomic_t, and the "add one and
return the old value" should be an "atomic_inc_return()-1" (the "-1"
is because "inc_return" returns the *new* value).

And feel free to add "_relaxed()" to that atomic op because this code
doesn't care about ordering of that counter. It will help on some
architectures, but as mentioned, this is not performance-crticial code
to begin with.

                Linus