linux-kernel - Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <29a20fd4-ac5e-44a3-bc8a-9f77aa6a3cf9@paulmck-laptop>
Date: Wed, 6 Mar 2024 19:21:14 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, linke li <lilinke99@...com>,
	joel@...lfernandes.org, boqun.feng@...il.com, dave@...olabs.net,
	frederic@...nel.org, jiangshanlai@...il.com, josh@...htriplett.org,
	linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
	qiang.zhang1211@...il.com, quic_neeraju@...cinc.com,
	rcu@...r.kernel.org
Subject: Re: [PATCH] rcutorture: Fix
 rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency
 bug

On Wed, Mar 06, 2024 at 06:49:38PM -0800, Linus Torvalds wrote:
> On Wed, 6 Mar 2024 at 18:43, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > I dunno.
> 
> Oh, and just looking at that patch, I still think the code is confused.
> 
> On the reading side, we have:
> 
>     pipe_count = smp_load_acquire(&p->rtort_pipe_count);
>     if (pipe_count > RCU_TORTURE_PIPE_LEN) {
>         /* Should not happen, but... */
> 
> where that comment clearly says that the pipe_count we read (whether
> with READ_ONCE() or with my smp_load_acquire() suggestion) should
> never be larger than RCU_TORTURE_PIPE_LEN.

I will fix that comment.  It should not happen *if* RCU is working
correctly.  It can happen if you have an RCU that is so broken that a
single RCU reader can span more than ten grace periods.  An example of
an RCU that really is this broken can be selected using rcutorture's
torture_type=busted module parameter.  No surprise, given that its
implementation of call_rcu() invokes the callback function directly and
its implementation of synchronize_rcu() is a complete no-op.  ;-)

Of course, the purpose of that value of the torture_type module parameter
(along with all other possible values containing the string "busted")
is to test rcutorture itself.

> But the writing side very clearly did:
> 
>     i = rp->rtort_pipe_count;
>     if (i > RCU_TORTURE_PIPE_LEN)
>         i = RCU_TORTURE_PIPE_LEN;
>     ...
>     smp_store_release(&rp->rtort_pipe_count, ++i);
> 
> (again, syntactically it could have been "i + 1" instead of my "++i" -
> same value), so clearly the writing side *can* write a value that is >
> RCU_TORTURE_PIPE_LEN.
> 
> So while the whole READ/WRITE_ONCE vs smp_load_acquire/store_release
> is one thing that might be worth looking at, I think there are other
> very confusing aspects here.

With this change in that comment, are things better?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 6b821a7037b03..0cb5452ecd945 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2000,7 +2000,8 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid)
 	preempt_disable();
 	pipe_count = READ_ONCE(p->rtort_pipe_count);
 	if (pipe_count > RCU_TORTURE_PIPE_LEN) {
-		/* Should not happen, but... */
+		// Should not happen in a correct RCU implementation,
+		// happens quite often for torture_type=busted.
 		pipe_count = RCU_TORTURE_PIPE_LEN;
 	}
 	completed = cur_ops->get_gp_seq();