[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj9Hr4PBobc13ZEv3HvFfpiZYrWX2-t5F62TXmMJoL5ZA@mail.gmail.com>
Date: Sat, 4 Jan 2025 14:05:49 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Manfred Spraul <manfred@...orfullife.com>, Christian Brauner <brauner@...nel.org>,
David Howells <dhowells@...hat.com>, WangYuli <wangyuli@...ontech.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()
On Thu, 2 Jan 2025 at 08:33, Oleg Nesterov <oleg@...hat.com> wrote:
>
> I was going to send a one-liner patch which adds mb() into pipe_poll()
> but then I decided to make even more spam and ask some questions first.
poll functions are not *supposed* to need memory barriers.
They are supposed to do "poll_wait()" and then not need any more
serialization after that, because we either
(a) have a NULL wait-address, in which case we're not going to sleep
and this is just a "check state"
(b) the waiting function is supposed to do add_wait_queue() (usually
by way of __pollwait) and that should be a sufficient barrier to
anybody who does a wakeup
Note that add_wait_queue() ends up doing a spinlock sequence, and
while that is not a full memory barrier (well, it is on x86, but not
necessarily in general), it *should* be sufficient against an actual
waker.
That's kind of how add_wait_queue() vs wake_up() is supposed to work.
Of course, the fact that we're not discussing the pipe code *not*
doing a full wake sequence (but just a "is the wait queue empty"
thing) is what then messes with the generic rules.
And this makes me think that the whole comment above
waitqueue_active() is just fundamentally wrong. The smp_mb() is *not*
sufficient in the sequence
smp_mb();
if (waitqueue_active(wq_head))
wake_up(wq_head);
because while it happens to work wrt prepare_to_wait() sequences, is
is *not* against other users of add_wait_queue().
In those other sequences the smp_mb() in set_current_state might have
happened long long before.
Those other users aren't just 'poll()', btw. There's quite a lot of
add_wait_queue() users in the kernel. It's a traditional model even if
it's not something people generally add to any more.
Now, hopefully many of those add_wait_queue() users end up using
set_current_state() and getting the memory barrier that way. Or they
use wait_woken() or any of the other proper helpers we have.
But I think this poll() thing is very much an example of this *not*
being valid, and I don't think it's in any way pipe-specific.
So maybe we really do need to add the memory barrier to
__add_wait_queue(). That's going to be painful, particularly with lots
of users not needing it because they have the barrier in all the other
places.
End result: maybe adding it just to __pollwait() is the thing to do,
in the hopes that non-poll users all use the proper sequences.
But no, this is most definitely not a pipe-only thing.
Linus
Powered by blists - more mailing lists