linux-kernel - Re: BFS vs. mainline scheduler benchmarks and measurements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 9 Sep 2009 20:04:04 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	Mike Galbraith <efault@....de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Con Kolivas <kernel@...ivas.org>, linux-kernel@...r.kernel.org
Subject: Re: BFS vs. mainline scheduler benchmarks and measurements


* Jens Axboe <jens.axboe@...cle.com> wrote:

> On Wed, Sep 09 2009, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Jens Axboe wrote:
> > > On Wed, Sep 09 2009, Mike Galbraith wrote:
> > > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > > > > * Jens Axboe <jens.axboe@...cle.com> wrote:
> > > > > 
> > > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > > > > And here's a newer version.
> > > > > > > 
> > > > > > > I tinkered a bit with your proglet and finally found the 
> > > > > > > problem.
> > > > > > > 
> > > > > > > You used a single pipe per child, this means the loop in 
> > > > > > > run_child() would consume what it just wrote out until it got 
> > > > > > > force preempted by the parent which would also get woken.
> > > > > > > 
> > > > > > > This results in the child spinning a while (its full quota) and 
> > > > > > > only reporting the last timestamp to the parent.
> > > > > > 
> > > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > > > > Thanks for the fixup, now it's at least usable to some degree.
> > > > > 
> > > > > What kind of latencies does it report on your box?
> > > > > 
> > > > > Our vanilla scheduler default latency targets are:
> > > > > 
> > > > >   single-core: 20 msecs
> > > > >     dual-core: 40 msecs
> > > > >     quad-core: 60 msecs
> > > > >     opto-core: 80 msecs
> > > > > 
> > > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > > > > /proc/sys/kernel/sched_latency_ns:
> > > > > 
> > > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > > 
> > > > He would also need to lower min_granularity, otherwise, it'd be larger
> > > > than the whole latency target.
> > > > 
> > > > I'm testing right now, and one thing that is definitely a problem is the
> > > > amount of sleeper fairness we're giving.  A full latency is just too
> > > > much short term fairness in my testing.  While sleepers are catching up,
> > > > hogs languish.  That's the biggest issue going on.
> > > > 
> > > > I've also been doing some timings of make -j4 (looking at idle time),
> > > > and find that child_runs_first is mildly detrimental to fork/exec load,
> > > > as are buddies.
> > > > 
> > > > I'm running with the below at the moment.  (the kthread/workqueue thing
> > > > is just because I don't see any reason for it to exist, so consider it
> > > > to be a waste of perfectly good math;)
> > > 
> > > Using latt, it seems better than -rc9. The below are entries logged
> > > while running make -j128 on a 64 thread box. I did two runs on each, and
> > > latt is using 8 clients.
> > > 
> > > -rc9
> > >         Max                23772 usec
> > >         Avg                 1129 usec
> > >         Stdev               4328 usec
> > >         Stdev mean           117 usec
> > > 
> > >         Max                32709 usec
> > >         Avg                 1467 usec
> > >         Stdev               5095 usec
> > >         Stdev mean           136 usec
> > > 
> > > -rc9 + patch
> > > 
> > >         Max                11561 usec
> > >         Avg                 1532 usec
> > >         Stdev               1994 usec
> > >         Stdev mean            48 usec
> > > 
> > >         Max                 9590 usec
> > >         Avg                 1550 usec
> > >         Stdev               2051 usec
> > >         Stdev mean            50 usec
> > > 
> > > max latency is way down, and much smaller variation as well.
> > 
> > Things are much better with this patch on the notebook! I cannot compare
> > with BFS as that still doesn't run anywhere I want it to run, but it's
> > way better than -rc9-git stock. latt numbers on the notebook have 1/3
> > the max latency, average is lower, and stddev is much smaller too.
> 
> BFS210 runs on the laptop (dual core intel core duo). With make -j4
> running, I clock the following latt -c8 'sleep 10' latencies:
> 
> -rc9
> 
>         Max                17895 usec
>         Avg                 8028 usec
>         Stdev               5948 usec
>         Stdev mean           405 usec
> 
>         Max                17896 usec
>         Avg                 4951 usec
>         Stdev               6278 usec
>         Stdev mean           427 usec
> 
>         Max                17885 usec
>         Avg                 5526 usec
>         Stdev               6819 usec
>         Stdev mean           464 usec
> 
> -rc9 + mike
> 
>         Max                 6061 usec
>         Avg                 3797 usec
>         Stdev               1726 usec
>         Stdev mean           117 usec
> 
>         Max                 5122 usec
>         Avg                 3958 usec
>         Stdev               1697 usec
>         Stdev mean           115 usec
> 
>         Max                 6691 usec
>         Avg                 2130 usec
>         Stdev               2165 usec
>         Stdev mean           147 usec

At least in my tests these latencies were mainly due to a bug in 
latt.c - i've attached the fixed version.

The other reason was wakeup batching. If you do this:

   echo 0 > /proc/sys/kernel/sched_wakeup_granularity_ns 

... then you can switch on insta-wakeups on -tip too.

With a dual-core box and a make -j4 background job running, on 
latest -tip i get the following latencies:

 $ ./latt -c8 sleep 30
 Entries: 656 (clients=8)

 Averages:
 ------------------------------ 
 	Max	      158 usec 
	Avg	       12 usec
	Stdev	       10 usec

Thanks,

	Ingo

View attachment "latt.c" of type "text/plain" (9068 bytes)