lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y38h9oe4ZEGNd7Zx@quatroqueijos.cascardo.eti.br>
Date:   Thu, 24 Nov 2022 04:49:10 -0300
From:   Thadeu Lima de Souza Cascardo <cascardo@...onical.com>
To:     Rishabh Bhatnagar <risbhat@...zon.com>
Cc:     gregkh@...uxfoundation.org, shakeelb@...gle.com,
        viro@...iv.linux.org.uk, bsegall@...gle.com, mdecandia@...il.com,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org,
        Soheil Hassas Yeganeh <soheil@...gle.com>,
        Guantao Liu <guantaol@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Khazhismel Kumykov <khazhy@...gle.com>,
        Davidlohr Bueso <dbueso@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 2/2] epoll: check for events when removing a timed out
 thread from the wait queue

On Thu, Nov 24, 2022 at 12:11:23AM +0000, Rishabh Bhatnagar wrote:
> From: Soheil Hassas Yeganeh <soheil@...gle.com>
> 
> Commit 289caf5d8f6c61c6d2b7fd752a7f483cd153f182 upstream.
> 
> Patch series "simplify ep_poll".
> 
> This patch series is a followup based on the suggestions and feedback by
> Linus:
> https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
> 
> The first patch in the series is a fix for the epoll race in presence of
> timeouts, so that it can be cleanly backported to all affected stable
> kernels.
> 
> The rest of the patch series simplify the ep_poll() implementation.  Some
> of these simplifications result in minor performance enhancements as well.
> We have kept these changes under self tests and internal benchmarks for a
> few days, and there are minor (1-2%) performance enhancements as a result.
> 
> This patch (of 8):
> 
> After abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2)
> timeout"), we break out of the ep_poll loop upon timeout, without checking
> whether there is any new events available.  Prior to that patch-series we
> always called ep_events_available() after exiting the loop.
> 
> This can cause races and missed wakeups.  For example, consider the
> following scenario reported by Guantao Liu:
> 
> Suppose we have an eventfd added using EPOLLET to an epollfd.
> 
> Thread 1: Sleeps for just below 5ms and then writes to an eventfd.
> Thread 2: Calls epoll_wait with a timeout of 5 ms. If it sees an
>           event of the eventfd, it will write back on that fd.
> Thread 3: Calls epoll_wait with a negative timeout.
> 
> Prior to abc610e01c66, it is guaranteed that Thread 3 will wake up either
> by Thread 1 or Thread 2.  After abc610e01c66, Thread 3 can be blocked
> indefinitely if Thread 2 sees a timeout right before the write to the
> eventfd by Thread 1.  Thread 2 will be woken up from
> schedule_hrtimeout_range and, with evail 0, it will not call
> ep_send_events().
> 
> To fix this issue:
> 1) Simplify the timed_out case as suggested by Linus.
> 2) while holding the lock, recheck whether the thread was woken up
>    after its time out has reached.
> 
> Note that (2) is different from Linus' original suggestion: It do not set
> "eavail = ep_events_available(ep)" to avoid unnecessary contention (when
> there are too many timed-out threads and a small number of events), as
> well as races mentioned in the discussion thread.
> 
> This is the first patch in the series so that the backport to stable
> releases is straightforward.
> 
> Link: https://lkml.kernel.org/r/20201106231635.3528496-1-soheil.kdev@gmail.com
> Link: https://lkml.kernel.org/r/CAHk-=wizk=OxUyQPbO8MS41w2Pag1kniUV5WdD5qWL-gq1kjDA@mail.gmail.com
> Link: https://lkml.kernel.org/r/20201106231635.3528496-2-soheil.kdev@gmail.com
> Fixes: abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout")
> Signed-off-by: Soheil Hassas Yeganeh <soheil@...gle.com>
> Tested-by: Guantao Liu <guantaol@...gle.com>
> Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Reported-by: Guantao Liu <guantaol@...gle.com>
> Reviewed-by: Eric Dumazet <edumazet@...gle.com>
> Reviewed-by: Willem de Bruijn <willemb@...gle.com>
> Reviewed-by: Khazhismel Kumykov <khazhy@...gle.com>
> Reviewed-by: Davidlohr Bueso <dbueso@...e.de>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Signed-off-by: Rishabh Bhatnagar <risbhat@...zon.com>

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@...onical.com>

> ---
>  fs/eventpoll.c | 25 ++++++++++++++++---------
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index e5496483a882..877f9f61a4e8 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1928,23 +1928,30 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>  		}
>  		write_unlock_irq(&ep->lock);
>  
> -		if (eavail || res)
> -			break;
> +		if (!eavail && !res)
> +			timed_out = !schedule_hrtimeout_range(to, slack,
> +							      HRTIMER_MODE_ABS);
>  
> -		if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS)) {
> -			timed_out = 1;
> -			break;
> -		}
> -
> -		/* We were woken up, thus go and try to harvest some events */
> +		/*
> +		 * We were woken up, thus go and try to harvest some events.
> +		 * If timed out and still on the wait queue, recheck eavail
> +		 * carefully under lock, below.
> +		 */
>  		eavail = 1;
> -
>  	} while (0);
>  
>  	__set_current_state(TASK_RUNNING);
>  
>  	if (!list_empty_careful(&wait.entry)) {
>  		write_lock_irq(&ep->lock);
> +		/*
> +		 * If the thread timed out and is not on the wait queue, it
> +		 * means that the thread was woken up after its timeout expired
> +		 * before it could reacquire the lock. Thus, when wait.entry is
> +		 * empty, it needs to harvest events.
> +		 */
> +		if (timed_out)
> +			eavail = list_empty(&wait.entry);
>  		__remove_wait_queue(&ep->wq, &wait);
>  		write_unlock_irq(&ep->lock);
>  	}
> -- 
> 2.37.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ