[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81612721-9448-83fa-4efe-603996d56b9a@akamai.com>
Date: Fri, 1 May 2020 18:09:51 -0400
From: Jason Baron <jbaron@...mai.com>
To: Roman Penyaev <rpenyaev@...e.de>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>, Heiher <r@....cc>,
Khazhismel Kumykov <khazhy@...gle.com>,
Davidlohr Bueso <dbueso@...e.de>, stable@...r.kernel.org
Subject: Re: [PATCH] epoll: ensure ep_poll() doesn't miss wakeup events
On 5/1/20 5:02 PM, Roman Penyaev wrote:
> Hi Jason,
>
> That is indeed a nice catch.
> Seems we need smp_rmb() pair between list_empty_careful(&rp->rdllist) and
> READ_ONCE(ep->ovflist) for ep_events_available(), do we?
>
Hi Roman,
Good point, even if we order those reads its still racy, since the
read of the ready list could come after its been cleared and the
read of the overflow could again come after its been cleared.
So I'm afraid we might need instead something like this to make
sure they are read together:
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d6ba0e5..31c5f14 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1899,7 +1899,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
break;
}
+ write_lock_irq(&ep->lock);
eavail = ep_events_available(ep);
+ write_unlock_irq(&ep->lock);
if (eavail)
break;
if (signal_pending(current)) {
Thanks,
-Jason
> Other than that:
>
> Reviewed-by: Roman Penyaev <rpenyaev@...e.de>
>
> --
> Roman
>
> On 2020-05-01 21:15, Jason Baron wrote:
>> Now that the ep_events_available() check is done in a lockless way, and
>> we no longer perform wakeups from ep_scan_ready_list(), we need to ensure
>> that either ep->rdllist has items or the overflow list is active. Prior to:
>> commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested
>> epoll"), we did wake_up(&ep->wq) after manipulating the ep->rdllist and the
>> overflow list. Thus, any waiters would observe the correct state. However,
>> with that wake_up() now removed we need to be more careful to ensure that
>> condition.
>>
>> Here's an example of what could go wrong:
>>
>> We have epoll fds: epfd1, epfd2. And epfd1 is added to epfd2 and epfd2 is
>> added to a socket: epfd1->epfd2->socket. Thread a is doing epoll_wait() on
>> epfd1, and thread b is doing epoll_wait on epfd2. Then:
>>
>> 1) data comes in on socket
>>
>> ep_poll_callback() wakes up threads a and b
>>
>> 2) thread a runs
>>
>> ep_poll()
>> ep_scan_ready_list()
>> ep_send_events_proc()
>> ep_item_poll()
>> ep_scan_ready_list()
>> list_splice_init(&ep->rdllist, &txlist);
>>
>> 3) now thread b is running
>>
>> ep_poll()
>> ep_events_available()
>> returns false
>> schedule_hrtimeout_range()
>>
>> Thus, thread b has now scheduled and missed the wakeup.
>>
>> Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>> Cc: Alexander Viro <viro@...iv.linux.org.uk>
>> Cc: Heiher <r@....cc>
>> Cc: Roman Penyaev <rpenyaev@...e.de>
>> Cc: Khazhismel Kumykov <khazhy@...gle.com>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Davidlohr Bueso <dbueso@...e.de>
>> Cc: <stable@...r.kernel.org>
>> ---
>> fs/eventpoll.c | 23 +++++++++++++++++------
>> 1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index aba03ee749f8..4af2d020f548 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -704,8 +704,14 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>> * in a lockless way.
>> */
>> write_lock_irq(&ep->lock);
>> - list_splice_init(&ep->rdllist, &txlist);
>> WRITE_ONCE(ep->ovflist, NULL);
>> + /*
>> + * In ep_poll() we use ep_events_available() in a lockless way to decide
>> + * if events are available. So we need to preserve that either
>> + * ep->oflist != EP_UNACTIVE_PTR or there are events on the ep->rdllist.
>> + */
>> + smp_wmb();
>> + list_splice_init(&ep->rdllist, &txlist);
>> write_unlock_irq(&ep->lock);
>>
>> /*
>> @@ -737,16 +743,21 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>> }
>> }
>> /*
>> + * Quickly re-inject items left on "txlist".
>> + */
>> + list_splice(&txlist, &ep->rdllist);
>> + /*
>> + * In ep_poll() we use ep_events_available() in a lockless way to decide
>> + * if events are available. So we need to preserve that either
>> + * ep->oflist != EP_UNACTIVE_PTR or there are events on the ep->rdllist.
>> + */
>> + smp_wmb();
>> + /*
>> * We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
>> * releasing the lock, events will be queued in the normal way inside
>> * ep->rdllist.
>> */
>> WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR);
>> -
>> - /*
>> - * Quickly re-inject items left on "txlist".
>> - */
>> - list_splice(&txlist, &ep->rdllist);
>> __pm_relax(ep->ws);
>> write_unlock_irq(&ep->lock);
>
Powered by blists - more mailing lists