linux-kernel - Re: [PATCH] doc: add note on usleep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170107194150.GA22557@osadl.at>
Date:   Sat, 7 Jan 2017 19:41:50 +0000
From:   Nicholas Mc Guire <der.herr@...r.at>
To:     Pavel Machek <pavel@....cz>
Cc:     Nicholas Mc Guire <hofrat@...dl.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH] doc: add note on usleep_range range

On Tue, Dec 27, 2016 at 10:56:26PM +0100, Pavel Machek wrote:
> On Tue 2016-12-13 04:58:43, Nicholas Mc Guire wrote:
> > useleep_range() with a delta of 0 makes no sense and only prevents the
> > timer subsystem from optimizing interrupts. As any user of usleep_range()
> > is in non-atomic context the timer jitter is in the range of 10s of 
> > microseconds anyway.
> > 
> > This adds a note making it clear that a range of 0 is a bad idea.
> > 
> > Signed-off-by: Nicholas Mc Guire <hofrat@...dl.org>
> > ---
> > 
> > as of 4.9.0 there are about 20 cases of usleep_ranges() that have 
> > min==max and none of them really look like they are necessary, so 
> > it does seem like a relatively common misunderstanding worth
> > noting in the documentation.
> > 
> > Patch is against 4.9.0 (localversion-next is 20161212)
> > 
> >  Documentation/timers/timers-howto.txt | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/Documentation/timers/timers-howto.txt b/Documentation/timers/timers-howto.txt
> > index 038f8c7..b5cdf82 100644
> > --- a/Documentation/timers/timers-howto.txt
> > +++ b/Documentation/timers/timers-howto.txt
> > @@ -93,6 +93,13 @@ NON-ATOMIC CONTEXT:
> >  			tolerances here are very situation specific, thus it
> >  			is left to the caller to determine a reasonable range.
> >  
> > +			A range of 0, that is usleep_range(100,100) or the 
> > +			like, do not make sense as this code is in a 
> > +			non-atomic section and a system can not be expected 
> > +			to have jitter 0. For any non-RT code any delta
> 
> Would it be possible to fix english here?

Aggreed that is crappy language - my bad - will fix - thanks!

> 
> "to have zero jitter" at least. I believe it is "does not".
> 
> I don't see how atomic vs. non-atomic context makes difference. There
> are sources of jitter that affect atomic context...

The relevance is that while there is jitter in atomic context it can
be quite small (depending on your hardware and the specifics of system
config) but in non-atomic context the jitter is so large that it
makes no relevant difference if you give usleep_range slack of a few
microseconds.

> 
> > +			less than 50 microseconds probably is only preventing
> > +			timer subsystem optimization but providing no benefit.
> 
> And I don't trust you here. _If_ it prevents timer optimalization,
> _then_ it provides benefit, at least in the average case.
>
here is the data:

System: Intel Core i7 CPU 920 @ 2.67GHz Ocotocore
OS: Debian 8.1 (but thats quite irrelevant)
Kernel: 4.10-rc2 (localversion-next next-20170106)
config: x86_64_defconfig (Voluntary | Preempt)

Test-setup - poped this into akernel module and just 
brute force load/unload it in a loop - not very elegant
but it does the job.

static int __init usleep_test_init(void)
{
        ktime_t now,last;
        unsigned long min,max;
        min = 200;
        max = 250;
        last = ktime_get();
        usleep_range(min, max);
        now = ktime_get();
        printk("%llu\n", ktime_to_ns(now)-ktime_to_ns(last));
        return 0;
}

Results:

usleep_range() 5000 samples - idle system 
 100,100         200,200         190,200
 Min.   :188481  Min.   :201917  Min.   :197793
 1st Qu.:207062  1st Qu.:207057  1st Qu.:207051
 Median :207139  Median :207133  Median :207133
 Mean   :207254  Mean   :207233  Mean   :207244
 3rd Qu.:207341  erd Qu.:207262  3rd Qu.:207610
 Max.   :225340  Max.   :214222  Max.   :214885

100,200 to 200,200 is maybe relevant impact for
some systems with respect to the outliers, but
mean and median are almost the same, for
190,200 to 200,200 there is statistically no
significant difference with respect to performance
Note that the timestamp before and after also has
jitter - so only part of the jitter can be attributed
to usleep_range() it self. But idle system optimization
is not that interesting for most systems.

On a loaded box:
Load here means that 8 find / | grep bla loops were started 
(not pinned to any particular core) and then the 
usleep_range() test ran 5000 times.

Same setup as above but this time we differenciae between
PREEMTI and PREEMPT_VOLUNTARY 

CONFIG_PREEMPT_VOLUNTARY=y
usleep_range() 5000 samples - load ~ 8
 100,200         190,200          200,200
 Min.   : 107812 Min.   :  203307 Min.   :  203432  
 1st Qu.: 558221 1st Qu.:  557490 1st Qu.:  510356  
 Median :1123425 Median : 1121939 Median : 1123316  
 Mean   :1103718 Mean   : 1100965 Mean   : 1100542  
 3rd Qu.:1541986 3rd Qu.: 1531478 3rd Qu.: 1517414  
 Max.   :8979183 Max.   :13765789 Max.   :12476136  

CONFIG_PREEMPT=y
usleep_range() 5000 samples - load ~ 8
 100,200          190,200          200,200
 Min.   :  115321 Min.   :  203963 Min.   :  203864  
 1st Qu.:  510296 1st Qu.:  451479 1st Qu.:  548131  
 Median : 1148660 Median : 1062576 Median : 1145228  
 Mean   : 1193449 Mean   : 1079379 Mean   : 1154728  
 3rd Qu.: 1601552 3rd Qu.: 1378622 3rd Qu.: 1570742  
 Max.   :12936192 Max.   :12346313 Max.   :13858732  
		
So for a loaded system it simply makes no relevant difference 
if you grant the subsystem 10 microseconds range or not. In 
fact if one wanted 200 us and would allow fow 200,250 it would 
be quite hard to notice the difference.

 usleep_range(200,200) vs  usleep_range(200,250) 
 Min.   :  203864          Min.   :  214003  
 1st Qu.:  548131          1st Qu.:  520436  
 Median : 1145228          Median : 1138698  
 Mean   : 1154728          Mean   : 1201871  
 3rd Qu.: 1570742          3rd Qu.: 1581952  
 Max.   :13858732          Max.   :12491198  

I would call the difference insignificant - ploted as curves you 
can hardly tell the distribution appart. As soon as you are looking
at more than a single tasks to optimize the difference would 
probably completely disapear.

thx!
hofrat