[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0810261225220.19212@alien.or.mcafeemobile.com>
Date: Sun, 26 Oct 2008 12:35:55 -0700 (PDT)
From: Davide Libenzi <davidel@...ilserver.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Arjan van de Ven <arjan@...radead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
On Sun, 26 Oct 2008, Linus Torvalds wrote:
>
>
> On Sun, 26 Oct 2008, Arjan van de Ven wrote:
> >
> > This one is upcoming fast (and I just hit it as well)
> >
> > http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> >
> > seems epoll grew some list corruption....
>
> It sounds very much like f337b9c58332bdecde965b436e47ea4c94d30da0 ("epoll:
> drop unnecessary test") deleted a test that wasn't so unnecessary after
> all..
>
> That ep_poll_callback() code is:
>
> /* If this file is already in the ready list we exit soon */
> if (ep_is_linked(&epi->rdllink))
> goto is_linked;
>
> list_add_tail(&epi->rdllink, &ep->rdllist);
>
> and the unnecessary test that was removed looks _very_ much like that kind
> of code.
No, the test was in the re-insertion loop. This is the patch that fixes it:
---
fs/eventpoll.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
Index: linux-2.6.mod/fs/eventpoll.c
===================================================================
--- linux-2.6.mod.orig/fs/eventpoll.c 2008-10-17 15:51:09.000000000 -0700
+++ linux-2.6.mod/fs/eventpoll.c 2008-10-17 15:54:14.000000000 -0700
@@ -930,8 +930,15 @@
* inside the main ready-list here.
*/
for (nepi = ep->ovflist; (epi = nepi) != NULL;
- nepi = epi->next, epi->next = EP_UNACTIVE_PTR)
- list_add_tail(&epi->rdllink, &ep->rdllist);
+ nepi = epi->next, epi->next = EP_UNACTIVE_PTR) {
+ /*
+ * If the above loop quit with errors, the epoll item might still
+ * be linked to "txlist", and the list_splice() done below will
+ * take care of those cases.
+ */
+ if (!ep_is_linked(&epi->rdllink))
+ list_add_tail(&epi->rdllink, &ep->rdllist);
+ }
/*
* We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
* releasing the lock, events will be queued in the normal way inside
> And if somebody knows how to reproduce this reliably, it would be really
> good to hear if doing a revert on that thing just fixed it. It should
> revert cleanly - it's the only change to fs/eventpoll.c since 2.6.27.
That's 100% due to the removed test that went in when you sucked up Andrew
bits after .27.
Patch has been confirmed by me and bug submitters.
> I'm somewhat inclined to revert it without even getting confirmation,
> since I wanted to do an early -rc2 today with all the brown-paper-bag
> fixes that have accumulated. But it would be good to get some
> confirmation.
No need to revert, since one of the removed tests is really un-needed.
> Btw, that whole logic in ep_send_events() sounds a bit scary. It says:
>
> We can loop without lock because this is a task private list.
>
> and it's true that "txlist" is a private list, but it still seems to
> depend on the fact that none of the "struct epitem"s on that list can be
> reached by any other means. And that whole thing depends on the magic
> behaviour of 'ep->ovflist', but there are a lot of code sequences that do
> *not* seem to test that at all.
>
> Maybe the rqace/bug has always been there (ie some sequence that works
> with an "epi->rdllink" without checking ovflist), but the "unnecessary"
> test protected us from seeing it in practice.
No. Bug was introduced by the removed test. Thomas contacted me telling
that such test was un-needed, and at first sight it looked OK to drop it
(since during the loop time the callback inserts into 'ep->ovflist').
But we didn't notice that a premature loop exit might leave items linked
to 'txlist', that the splice below takes care of re-insert them.
You can speed up things by sucking the patch directly, w/out waiting
Andrew's pull.
- Davide
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists