[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <463C1A66.3080806@cosmosbay.com>
Date: Sat, 05 May 2007 07:47:18 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Davi Arnaut <davi@...ent.com.br>,
Andrew Morton <akpm@...ux-foundation.org>,
Davide Libenzi <davidel@...ilserver.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] rfc: threaded epoll_wait thundering herd
Linus Torvalds a écrit :
>
> On Sat, 5 May 2007, Eric Dumazet wrote:
>> But... what happens if the thread that was chosen exits from the loop in
>> ep_poll() with res = -EINTR (because of signal_pending(current))
>
> Not a problem.
>
> What happens is that an exclusive wake-up stops on the first entry in the
> wait-queue that it actually *wakes*up*, but if some task has just marked
> itself as being TASK_UNINTERRUPTIBLE, but is still on the run-queue, it
> will just be marked TASK_RUNNING and that in itself isn't enough to cause
> the "exclusive" test to trigger.
>
> The code in sched.c is subtle, but worth understanding if you care about
> these things. You should look at:
>
> - try_to_wake_up() - this is the default wakeup function (and the one
> that should work correctly - I'm not going to guarantee that any of the
> other specialty-wakeup-functions do so)
>
> The return value is the important thing. Returning non-zero is
> "success", and implies that we actually activated it.
>
> See the "goto out_running" case for the case where the process was
> still actually on the run-queues, and we just ended up setting
> "p->state = TASK_RUNNING" - we still return 0, and the "exclusive"
> logic will not trigger.
>
> - __wake_up_common: this is the thing that _calls_ the above, and which
> cares about the return value above. It does
>
> if (curr->func(curr, mode, sync, key) &&
> (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
>
>
> ie it only decrements (and triggers) the nr_exclusive thing when the
> wakeup-function returned non-zero (and when the waitqueue entry was
> marked exclusive, of course).
>
> So what does all this subtlety *mean*?
>
> Walk through it. It means that it is safe to do the
>
> if (signal_pending())
> return -EINTR;
>
> kind of thing, because *when* you do this, you obviously are always on the
> run-queue (otherwise the process wouldn't be running, and couldn't be
> doing the test). So if there is somebody else waking you up right then and
> there, they'll never count your wakeup as an exclusive one, and they will
> wake up at least one other real exclusive waiter.
>
> (IOW, you get a very very small probability of a very very small
> "thundering herd" - obviously it won't be "thundering" any more, it will
> be more of a "whispering herdlet").
>
> The Linux kernel sleep/wakeup thing is really quite nifty and smart. And
> very few people realize just *how* nifty and elegant (and efficient) it
> is. Hopefully a few more people appreciate its beauty and subtlety now ;)
>
Thank you Linus for these detailed explanations.
I think I was frightened not by the wakeup logic, but by the possibility in
SMP that a signal could be delivered to the thread just after it has been
selected.
Looking again at ep_poll(), I see :
set_current_state(TASK_INTERRUPTIBLE);
[*] if (!list_empty(&ep->rdllist) || !jtimeout)
break;
if (signal_pending(current)) {
res = -EINTR;
break;
}
So the test against signal_pending() is not done if an event is present in
ready list : It should be delivered even if a signal is pending. I missed this
bit ealier...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists