linux-kernel - Re: [PATCH] rcutorture: Fix rcu_torture_pipe_update_one()/rcu_torture

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240307082059.1dc2582d@gandalf.local.home>
Date: Thu, 7 Mar 2024 08:20:59 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, linke li <lilinke99@...com>,
 joel@...lfernandes.org, boqun.feng@...il.com, dave@...olabs.net,
 frederic@...nel.org, jiangshanlai@...il.com, josh@...htriplett.org,
 linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
 qiang.zhang1211@...il.com, quic_neeraju@...cinc.com, rcu@...r.kernel.org
Subject: Re: [PATCH] rcutorture: Fix
 rcu_torture_pipe_update_one()/rcu_torture_writer() data race and
 concurrency bug

On Wed, 6 Mar 2024 12:06:00 -0800
Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Wed, 6 Mar 2024 at 11:45, Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> > Here's the back story. I received the following patch:
> >
> >   https://lore.kernel.org/all/tencent_BA1473492BC618B473864561EA3AB1418908@qq.com/
> >
> > I didn't like it. My reply was:
> >  
> >         > -     rbwork->wait_index++;
> >         > +     WRITE_ONCE(rbwork->wait_index, READ_ONCE(rbwork->wait_index) + 1);  
> >
> >         I mean the above is really ugly. If this is the new thing to do, we need
> >         better macros.
> >
> >         If anything, just convert it to an atomic_t.  
> 
> The right thing is definitely to convert it to an atomic_t.

Agreed.

> 
> The memory barriers can probably also be turned into atomic ordering,
> although we don't always have all the variates.
> 
> But for example, that
> 
>                 /* Make sure to see the new wait index */
>                 smp_rmb();
>                 if (wait_index != work->wait_index)
>                         break;
> 
> looks odd, and should probably do an "atomic_read_acquire()" instead
> of a rmb and a (non-atomic and non-READ_ONCE thing).
> 
> The first READ_ONCE() should probably also be that atomic_read_acquire() op.
> 
> On the writing side, my gut feel is that the
> 
>         rbwork->wait_index++;
>         /* make sure the waiters see the new index */
>         smp_wmb();
> 
> should be an "atomic_inc_release(&rbwork->wait_index);" but we don't
> actually have that operation. We only have the "release" versions for
> things that return a value.
> 
> So it would probably need to be either
> 
>         atomic_inc(&rbwork->wait_index);
>         /* make sure the waiters see the new index */
>         smp_wmb();
> 
> or
> 
>         atomic_inc_return_release(&rbwork->wait_index);
> 
> or we'd need to add the "basic atomics with ordering semantics" (which
> we aren't going to do unless we end up with a lot more people who want
> them).
> 
> I dunno. I didn't look all *that* closely at the code. The above might
> be garbage too. Somebody who actually knows the code should think
> about what ordering they actually were looking for.
> 
> (And I note that 'wait_index' is of type 'long' in 'struct
> rb_irq_work', so I guess it should be "atomic_long_t" instead -  just
> shows how little attention I paid on the first read-through, which
> should make everybody go "I need to double-check Linus here")

It doesn't need to be long. I'm not even sure why I made it long. Probably
for natural alignment.

The code isn't that complex. You can have a list of waiters waiting for the
ring buffer to fill to various watermarks (in most cases, it's just one waiter).

When a write happens, it looks to see if the smallest watermark is hit,
if so, calls irqwork to wakeup all the waiters.

The waiters will wake up, check to see if a signal is pending or if the
ring buffer has hit the watermark the waiter was waiting for and exit the
wait loop.

What the wait_index does, is just a way to force all waiters out of the wait
loop regardless of the watermark the waiter is waiting for. Before a waiter
goes into the wait loop, it saves the current wait_index. The waker will
increment the wait_index and then call the same irq_work to wake up all the
waiters.

After the wakeup happens, the waiter will test if the new wait_index
matches what it was before it entered the loop, and if it is different, it
falls out of the loop. Then the caller of the ring_buffer_wait() can
re-evaluate if it needs to enter the wait again.

The wait_index loop exit was needed for when the file descriptor of a file
that uses a ring buffer closes and it needs to wake up all the readers of
that file descriptor to notify their tasks that the file closed.

So we can switch the:

	rbwork->wait_index++;
	smp_wmb();

into just a:

	(void)atomic_inc_return_release(&rbwork->wait_index);

and the:

	smp_rmb()
	if (wait_index != work->wait_index)

into:

	if (wait_index != atomic_read_acquire(&rb->wait_index))

I'll write up a patch.

Hmm, I have the same wait_index logic at the higher level doing basically
the same thing (at the close of the file). I'll switch that over too.

-- Steve