linux-kernel - Re: [PATCH] doc: add note on usleep

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170111085007.GA13195@osadl.at>
Date:   Wed, 11 Jan 2017 08:50:07 +0000
From:   Nicholas Mc Guire <der.herr@...r.at>
To:     Pavel Machek <pavel@....cz>
Cc:     Nicholas Mc Guire <hofrat@...dl.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH] doc: add note on usleep_range range

On Tue, Jan 10, 2017 at 10:25:29PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > "to have zero jitter" at least. I believe it is "does not".
> > > 
> > > I don't see how atomic vs. non-atomic context makes difference. There
> > > are sources of jitter that affect atomic context...
> > 
> > The relevance is that while there is jitter in atomic context it can
> > be quite small (depending on your hardware and the specifics of system
> > config) but in non-atomic context the jitter is so large that it
> > makes no relevant difference if you give usleep_range slack of a few
> > microseconds.
> 
> I disagree here. Even in non-atomic code, you'll get _no_ jitter most
> of the time. If you care about average case, small slack may still
> make sense.

yes - thats what the results say - the mean does not differe significantly
so if you care about average case - it makes no difference.

> 
> > > > +			less than 50 microseconds probably is only preventing
> > > > +			timer subsystem optimization but providing no benefit.
> > > 
> > > And I don't trust you here. _If_ it prevents timer optimalization,
> > > _then_ it provides benefit, at least in the average case.
> > >
> > here is the data:
> > 
> > System: Intel Core i7 CPU 920 @ 2.67GHz Ocotocore
> > OS: Debian 8.1 (but thats quite irrelevant)
> > Kernel: 4.10-rc2 (localversion-next next-20170106)
> > config: x86_64_defconfig (Voluntary | Preempt)
> > 
> > Test-setup - poped this into akernel module and just 
> > brute force load/unload it in a loop - not very elegant
> > but it does the job.
> > 
> > static int __init usleep_test_init(void)
> > {
> >         ktime_t now,last;
> >         unsigned long min,max;
> >         min = 200;
> >         max = 250;
> >         last = ktime_get();
> >         usleep_range(min, max);
> >         now = ktime_get();
> >         printk("%llu\n", ktime_to_ns(now)-ktime_to_ns(last));
> >         return 0;
> > }
> > 
> > Results:
> > 
> > usleep_range() 5000 samples - idle system 
> >  100,100         200,200         190,200
> >  Min.   :188481  Min.   :201917  Min.   :197793
> >  1st Qu.:207062  1st Qu.:207057  1st Qu.:207051
> >  Median :207139  Median :207133  Median :207133
> >  Mean   :207254  Mean   :207233  Mean   :207244
> >  3rd Qu.:207341  erd Qu.:207262  3rd Qu.:207610
> >  Max.   :225340  Max.   :214222  Max.   :214885
> > 
> > 100,200 to 200,200 is maybe relevant impact for
> > some systems with respect to the outliers, but
> > mean and median are almost the same, for
> > 190,200 to 200,200 there is statistically no
> > significant difference with respect to performance
> > Note that the timestamp before and after also has
> > jitter - so only part of the jitter can be attributed
> > to usleep_range() it self. But idle system optimization
> > is not that interesting for most systems.
> 
> I disagree here. Most of systems are idle, most of the time. You say
> that basically everyone should provide 50 usec of slack... So I guess
> I'd like to see comparisons for 200,200 and 200,250 (and perhaps also
> 200,500 or something).
>
I did not say that everyone should use 50us of slack - rather the statement 
was "makes no relevant difference if you give usleep_range slack of a few
microseconds." and that min==max makes *no* sense and that providing 
even just small slack (in 10s of us range) makes a relevant difference 
at system level. 

Regarding idle system - the statement is that optimizing for idle
system makes no sense - not that idle system is rare. In an idle
system (as you can see in the above table) there is *no* diffeence
in the mean values - just to highligt this

  100,200         200,200         190,200
  Mean   :207254  Mean   :207233  Mean   :207244

so for an idle system it makes very little difference (and I still doubt
that anyone could find this sub promille difference by testing at the
application level) - conversely for a loaded system the whole issue is 
irrelevant as the jitter is completely dominated from system activity and
the usleep_range() parameters have more or less no impact. 

In summary:
  idle-system: 10s of us difference between min/max has if at all 
               marginal impact
  loaded-system: no negative impact at all

but the system as a whole can profit from reducing the number of hires 
timersit needs to hanle. Thus I still see no reason to not consider
usleep_range(min,max) with min==max as a mistake.

But to put a numer on it - if max-min < 10us I would consider it wrong 
I think that basically never makes sense for any non RT (PREEMT-RT that 
is) thread.

thx!
hofrat