lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <463C1A66.3080806@cosmosbay.com>
Date:	Sat, 05 May 2007 07:47:18 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Davi Arnaut <davi@...ent.com.br>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Davide Libenzi <davidel@...ilserver.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] rfc: threaded epoll_wait thundering herd

Linus Torvalds a écrit :
> 
> On Sat, 5 May 2007, Eric Dumazet wrote:
>> But... what happens if the thread that was chosen exits from the loop in
>> ep_poll() with res = -EINTR (because of signal_pending(current))
> 
> Not a problem.
> 
> What happens is that an exclusive wake-up stops on the first entry in the 
> wait-queue that it actually *wakes*up*, but if some task has just marked 
> itself as being TASK_UNINTERRUPTIBLE, but is still on the run-queue, it 
> will just be marked TASK_RUNNING and that in itself isn't enough to cause 
> the "exclusive" test to trigger.
> 
> The code in sched.c is subtle, but worth understanding if you care about 
> these things. You should look at:
> 
>  - try_to_wake_up() - this is the default wakeup function (and the one 
>    that should work correctly - I'm not going to guarantee that any of the 
>    other specialty-wakeup-functions do so)
> 
>    The return value is the important thing. Returning non-zero is 
>    "success", and implies that we actually activated it.
> 
>    See the "goto out_running" case for the case where the process was 
>    still actually on the run-queues, and we just ended up setting 
>    "p->state = TASK_RUNNING" - we still return 0, and the "exclusive" 
>    logic will not trigger.
> 
>  - __wake_up_common: this is the thing that _calls_ the above, and which 
>    cares about the return value above. It does
> 
> 	if (curr->func(curr, mode, sync, key) &&
> 		(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
> 
> 
>    ie it only decrements (and triggers) the nr_exclusive thing when the 
>    wakeup-function returned non-zero (and when the waitqueue entry was 
>    marked exclusive, of course).
> 
> So what does all this subtlety *mean*?
> 
> Walk through it. It means that it is safe to do the
> 
> 	if (signal_pending())
> 		return -EINTR;
> 
> kind of thing, because *when* you do this, you obviously are always on the 
> run-queue (otherwise the process wouldn't be running, and couldn't be 
> doing the test). So if there is somebody else waking you up right then and 
> there, they'll never count your wakeup as an exclusive one, and they will 
> wake up at least one other real exclusive waiter.
> 
> (IOW, you get a very very small probability of a very very small 
> "thundering herd" - obviously it won't be "thundering" any more, it will 
> be more of a "whispering herdlet").
> 
> The Linux kernel sleep/wakeup thing is really quite nifty and smart. And 
> very few people realize just *how* nifty and elegant (and efficient) it 
> is. Hopefully a few more people appreciate its beauty and subtlety now ;)
> 

Thank you Linus for these detailed explanations.

I think I was frightened not by the wakeup logic, but by the possibility in 
SMP that a signal could be delivered to the thread just after it has been 
selected.

Looking again at ep_poll(), I see  :

			set_current_state(TASK_INTERRUPTIBLE);
[*]                     if (!list_empty(&ep->rdllist) || !jtimeout)
                                 break;
                         if (signal_pending(current)) {
                                 res = -EINTR;
                                 break;
                         }

So the test against signal_pending() is not done if an event is present in 
ready list : It should be delivered even if a signal is pending. I missed this 
bit ealier...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ