linux-kernel - Re: [PATCH 6/7] sched: Clean up preempt_enable_no

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <528E09F9.9030501@linux.intel.com>
Date:	Thu, 21 Nov 2013 15:26:17 +0200
From:	Eliezer Tamir <eliezer.tamir@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Arjan van de Ven <arjan@...ux.intel.com>, lenb@...nel.org,
	rjw@...ysocki.net, Chris Leech <christopher.leech@...el.com>,
	David Miller <davem@...emloft.net>, rui.zhang@...el.com,
	jacob.jun.pan@...ux.intel.com,
	Mike Galbraith <bitbucket@...ine.de>,
	Ingo Molnar <mingo@...nel.org>, hpa@...or.com,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH 6/7] sched: Clean up preempt_enable_no_resched() abuse

On 21/11/2013 12:10, Peter Zijlstra wrote:
> On Wed, Nov 20, 2013 at 08:02:54PM +0200, Eliezer Tamir wrote:
>> IMHO This has been reviewed thoroughly.
>>
>> When Ben Hutchings voiced concerns I rewrote the code to use time_after,
>> so even if you do get switched over to a CPU where the time is random
>> you will at most poll another full interval.
>>
>> Linus asked me to remove this since it makes us use two time values
>> instead of one. see https://lkml.org/lkml/2013/7/8/345.
> 
> I'm not sure I see how this would be true.
> 
> So the do_select() code basically does:
> 
>   for (;;) {
> 
>     /* actual poll loop */
> 
>     if (!need_resched()) {
>       if (!busy_end) {
> 	busy_end = now() + busypoll;
> 	continue;
>       }
>       if (!((long)(busy_end - now()) < 0))
> 	continue;
>     }
> 
>     /* go sleep */
> 
>   }
> 
> So imagine our CPU0 timebase is 1 minute ahead of CPU1 (60e9 vs 0), and we start by:
> 
>   busy_end = now() + busypoll; /* CPU0: 60e9 + d */
> 
> but then we migrate to CPU1 and do:
> 
>   busy_end - now() /* CPU1: 60e9 + d' */
> 
> and find we're still a minute out; and in fact we'll keep spinning for
> that entire minute barring a need_resched().

not exactly, poll will return if there are any events to report of if
a signal is pending.

> Surely that's not intended and desired?

This limit is an extra safety net, because busy polling is expensive,
we limit the time we are willing to do it.

We don't override any limit the user has put on the system call.
A signal or having events to report will also stop the looping.
So we are mostly capping the resources an _idle_ system will waste
on busy polling.

We want to globally cap the amount of time the system busy polls, on
average. Nothing catastrophic will happen in the extremely rare occasion
that we miss.

The alternative is to use one more int on every poll/select all the
time, this seems like a bigger cost.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/