[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1357148750.21409.17169.camel@edumazet-glaptop>
Date: Wed, 02 Jan 2013 09:45:50 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Eric Wong <normalperson@...t.net>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Hans Verkuil <hans.verkuil@...co.com>,
Jiri Olsa <jolsa@...hat.com>, Jonathan Corbet <corbet@....net>,
Al Viro <viro@...iv.linux.org.uk>,
Davide Libenzi <davidel@...ilserver.org>,
Hans de Goede <hdegoede@...hat.com>,
Mauro Carvalho Chehab <mchehab@...radead.org>,
David Miller <davem@...emloft.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Andreas Voellmy <andreas.voellmy@...e.edu>,
"Junchang(Jason) Wang" <junchang.wang@...e.edu>,
Network Development <netdev@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
On Tue, 2013-01-01 at 23:56 +0000, Eric Wong wrote:
> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > Please document the barrier that this mb() pairs with, and then give
> > an explanation for the fix in the commit message, and I'll happily
> > take it. Even if it's just duplicating the comments above the
> > wq_has_sleeper() function, except modified for the ep_modify() case.
>
> Hopefully my explanation is correct and makes sense below,
> I think both effects of the barrier are needed
>
> > Of course, it would be good to get verification from Jason and Andreas
> > that the alternate patch also works for them.
>
> Jason just confirmed it.
>
> ------------------------------- 8< ----------------------------
> From 02f43757d04bb6f2786e79eecf1cfa82e6574379 Mon Sep 17 00:00:00 2001
> From: Eric Wong <normalperson@...t.net>
> Date: Tue, 1 Jan 2013 21:20:27 +0000
> Subject: [PATCH] epoll: prevent missed events on EPOLL_CTL_MOD
>
> EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
> ensure events are not missed. Since the modifications to the interest
> mask are not protected by the same lock as ep_poll_callback, we need to
> ensure the change is visible to other CPUs calling ep_poll_callback.
>
> We also need to ensure f_op->poll() has an up-to-date view of past
> events which occured before we modified the interest mask. So this
> barrier also pairs with the barrier in wq_has_sleeper().
>
> This should guarantee either ep_poll_callback or f_op->poll() (or both)
> will notice the readiness of a recently-ready/modified item.
>
> This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
> http://thread.gmane.org/gmane.linux.kernel/1408782/
>
> Signed-off-by: Eric Wong <normalperson@...t.net>
> Cc: Hans Verkuil <hans.verkuil@...co.com>
> Cc: Jiri Olsa <jolsa@...hat.com>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Al Viro <viro@...iv.linux.org.uk>
> Cc: Davide Libenzi <davidel@...ilserver.org>
> Cc: Hans de Goede <hdegoede@...hat.com>
> Cc: Mauro Carvalho Chehab <mchehab@...radead.org>
> Cc: David Miller <davem@...emloft.net>
> Cc: Eric Dumazet <eric.dumazet@...il.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Andreas Voellmy <andreas.voellmy@...e.edu>
> Tested-by: "Junchang(Jason) Wang" <junchang.wang@...e.edu>
> Cc: netdev@...r.kernel.org
> Cc: linux-fsdevel@...r.kernel.org
> ---
> fs/eventpoll.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index cd96649..39573ee 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -1285,7 +1285,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
> * otherwise we might miss an event that happens between the
> * f_op->poll() call and the new event set registering.
> */
> - epi->event.events = event->events;
> + epi->event.events = event->events; /* need barrier below */
> pt._key = event->events;
> epi->event.data = event->data; /* protected by mtx */
> if (epi->event.events & EPOLLWAKEUP) {
> @@ -1296,6 +1296,26 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
> }
>
> /*
> + * The following barrier has two effects:
> + *
> + * 1) Flush epi changes above to other CPUs. This ensures
> + * we do not miss events from ep_poll_callback if an
> + * event occurs immediately after we call f_op->poll().
> + * We need this because we did not take ep->lock while
> + * changing epi above (but ep_poll_callback does take
> + * ep->lock).
> + *
> + * 2) We also need to ensure we do not miss _past_ events
> + * when calling f_op->poll(). This barrier also
> + * pairs with the barrier in wq_has_sleeper (see
> + * comments for wq_has_sleeper).
> + *
> + * This barrier will now guarantee ep_poll_callback or f_op->poll
> + * (or both) will notice the readiness of an item.
> + */
> + smp_mb();
> +
> + /*
> * Get current event bits. We can safely use the file* here because
> * its usage count has been increased by the caller of this function.
> */
> --
> Eric Wong
First, thanks for working on this issue.
It seems the real problem is the epi->event.events = event->events;
which is done without taking ep->lock
While a smp_mb() could reduce the race window, I believe there is still
a race, and the following patch would close it.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index be56b21..25e5c53 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1313,7 +1313,10 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
* otherwise we might miss an event that happens between the
* f_op->poll() call and the new event set registering.
*/
+ spin_lock_irq(&ep->lock);
epi->event.events = event->events;
+ spin_unlock_irq(&ep->lock);
+
pt._key = event->events;
epi->event.data = event->data; /* protected by mtx */
if (epi->event.events & EPOLLWAKEUP) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists