[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250106163038.GE7233@redhat.com>
Date: Mon, 6 Jan 2025 17:30:39 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Manfred Spraul <manfred@...orfullife.com>,
Christian Brauner <brauner@...nel.org>,
David Howells <dhowells@...hat.com>,
WangYuli <wangyuli@...ontech.com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: wakeup_pipe_readers/writers() && pipe_poll()
On 01/04, Linus Torvalds wrote:
>
> On Thu, 2 Jan 2025 at 08:33, Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > I was going to send a one-liner patch which adds mb() into pipe_poll()
> > but then I decided to make even more spam and ask some questions first.
>
> poll functions are not *supposed* to need memory barriers.
...
> But no, this is most definitely not a pipe-only thing.
Agreed, that is why I didn't send the patch which adds mb() into pipe_poll().
> They are supposed to do "poll_wait()" and then not need any more
> serialization after that, because we either
>
> (a) have a NULL wait-address, in which case we're not going to sleep
> and this is just a "check state"
To be honest, I don't understand the wait_address check in poll_wait(),
it seems that wait_address is never NULL.
But ->_qproc can be NULL if do_poll/etc does another iteration after
poll_schedule_timeout(), in this case we can sleep again. But this
case is fine.
> (b) the waiting function is supposed to do add_wait_queue() (usually
> by way of __pollwait) and that should be a sufficient barrier to
> anybody who does a wakeup
Yes.
> And this makes me think that the whole comment above
> waitqueue_active() is just fundamentally wrong. The smp_mb() is *not*
> sufficient in the sequence
>
> smp_mb();
> if (waitqueue_active(wq_head))
> wake_up(wq_head);
>
> because while it happens to work wrt prepare_to_wait() sequences, is
> is *not* against other users of add_wait_queue().
Well, this comment doesn't look wrong to me, but perhaps it can be
more clear. It should probably explain that this pseudo code is only
correct because the waiter has a barrier before "if (@cond)" which
pairs with smp_mb() above waitqueue_active(). It even says
prepare_to_wait(&wq_head, &wait, state);
// smp_mb() from set_current_state()
but perhaps this is not 100% clear.
> But I think this poll() thing is very much an example of this *not*
> being valid, and I don't think it's in any way pipe-specific.
Yes.
> So maybe we really do need to add the memory barrier to
> __add_wait_queue(). That's going to be painful, particularly with lots
> of users not needing it because they have the barrier in all the other
> places.
Yes, that is why I don't really like the idea to add mb() into
__add_wait_queue().
> End result: maybe adding it just to __pollwait() is the thing to do,
> in the hopes that non-poll users all use the proper sequences.
That is what I tried to propose. Will you agree with this change?
We can even use smp_store_mb(), say
@@ -224,11 +224,12 @@ static void __pollwait(struct file *filp, wait_queue_head_t *wait_address,
if (!entry)
return;
entry->filp = get_file(filp);
- entry->wait_address = wait_address;
entry->key = p->_key;
init_waitqueue_func_entry(&entry->wait, pollwake);
entry->wait.private = pwq;
add_wait_queue(wait_address, &entry->wait);
+ // COMMENT
+ smp_store_mb(entry->wait_address, wait_address);
Oleg.
Powered by blists - more mailing lists