lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81612721-9448-83fa-4efe-603996d56b9a@akamai.com>
Date:   Fri, 1 May 2020 18:09:51 -0400
From:   Jason Baron <jbaron@...mai.com>
To:     Roman Penyaev <rpenyaev@...e.de>
Cc:     akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>, Heiher <r@....cc>,
        Khazhismel Kumykov <khazhy@...gle.com>,
        Davidlohr Bueso <dbueso@...e.de>, stable@...r.kernel.org
Subject: Re: [PATCH] epoll: ensure ep_poll() doesn't miss wakeup events



On 5/1/20 5:02 PM, Roman Penyaev wrote:
> Hi Jason,
> 
> That is indeed a nice catch.
> Seems we need smp_rmb() pair between list_empty_careful(&rp->rdllist) and
> READ_ONCE(ep->ovflist) for ep_events_available(), do we?
> 

Hi Roman,

Good point, even if we order those reads its still racy, since the
read of the ready list could come after its been cleared and the
read of the overflow could again come after its been cleared.

So I'm afraid we might need instead something like this to make
sure they are read together:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index d6ba0e5..31c5f14 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1899,7 +1899,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
                        break;
                }

+               write_lock_irq(&ep->lock);
                eavail = ep_events_available(ep);
+               write_unlock_irq(&ep->lock);
                if (eavail)
                        break;
                if (signal_pending(current)) {


Thanks,

-Jason



> Other than that:
> 
> Reviewed-by: Roman Penyaev <rpenyaev@...e.de>
> 
> -- 
> Roman
> 
> On 2020-05-01 21:15, Jason Baron wrote:
>> Now that the ep_events_available() check is done in a lockless way, and
>> we no longer perform wakeups from ep_scan_ready_list(), we need to ensure
>> that either ep->rdllist has items or the overflow list is active. Prior to:
>> commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested
>> epoll"), we did wake_up(&ep->wq) after manipulating the ep->rdllist and the
>> overflow list. Thus, any waiters would observe the correct state. However,
>> with that wake_up() now removed we need to be more careful to ensure that
>> condition.
>>
>> Here's an example of what could go wrong:
>>
>> We have epoll fds: epfd1, epfd2. And epfd1 is added to epfd2 and epfd2 is
>> added to a socket: epfd1->epfd2->socket. Thread a is doing epoll_wait() on
>> epfd1, and thread b is doing epoll_wait on epfd2. Then:
>>
>> 1) data comes in on socket
>>
>> ep_poll_callback() wakes up threads a and b
>>
>> 2) thread a runs
>>
>> ep_poll()
>>  ep_scan_ready_list()
>>   ep_send_events_proc()
>>    ep_item_poll()
>>      ep_scan_ready_list()
>>        list_splice_init(&ep->rdllist, &txlist);
>>
>> 3) now thread b is running
>>
>> ep_poll()
>>  ep_events_available()
>>    returns false
>>  schedule_hrtimeout_range()
>>
>> Thus, thread b has now scheduled and missed the wakeup.
>>
>> Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>> Cc: Alexander Viro <viro@...iv.linux.org.uk>
>> Cc: Heiher <r@....cc>
>> Cc: Roman Penyaev <rpenyaev@...e.de>
>> Cc: Khazhismel Kumykov <khazhy@...gle.com>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Davidlohr Bueso <dbueso@...e.de>
>> Cc: <stable@...r.kernel.org>
>> ---
>>  fs/eventpoll.c | 23 +++++++++++++++++------
>>  1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index aba03ee749f8..4af2d020f548 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -704,8 +704,14 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>>       * in a lockless way.
>>       */
>>      write_lock_irq(&ep->lock);
>> -    list_splice_init(&ep->rdllist, &txlist);
>>      WRITE_ONCE(ep->ovflist, NULL);
>> +    /*
>> +     * In ep_poll() we use ep_events_available() in a lockless way to decide
>> +     * if events are available. So we need to preserve that either
>> +     * ep->oflist != EP_UNACTIVE_PTR or there are events on the ep->rdllist.
>> +     */
>> +    smp_wmb();
>> +    list_splice_init(&ep->rdllist, &txlist);
>>      write_unlock_irq(&ep->lock);
>>
>>      /*
>> @@ -737,16 +743,21 @@ static __poll_t ep_scan_ready_list(struct eventpoll *ep,
>>          }
>>      }
>>      /*
>> +     * Quickly re-inject items left on "txlist".
>> +     */
>> +    list_splice(&txlist, &ep->rdllist);
>> +    /*
>> +     * In ep_poll() we use ep_events_available() in a lockless way to decide
>> +     * if events are available. So we need to preserve that either
>> +     * ep->oflist != EP_UNACTIVE_PTR or there are events on the ep->rdllist.
>> +     */
>> +    smp_wmb();
>> +    /*
>>       * We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
>>       * releasing the lock, events will be queued in the normal way inside
>>       * ep->rdllist.
>>       */
>>      WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR);
>> -
>> -    /*
>> -     * Quickly re-inject items left on "txlist".
>> -     */
>> -    list_splice(&txlist, &ep->rdllist);
>>      __pm_relax(ep->ws);
>>      write_unlock_irq(&ep->lock);
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ