linux-kernel - Re: BFS vs. mainline scheduler benchmarks and measurements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 9 Sep 2009 14:20:06 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Mike Galbraith <efault@....de>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Con Kolivas <kernel@...ivas.org>, linux-kernel@...r.kernel.org
Subject: Re: BFS vs. mainline scheduler benchmarks and measurements

On Wed, Sep 09 2009, Jens Axboe wrote:
> On Wed, Sep 09 2009, Jens Axboe wrote:
> > On Wed, Sep 09 2009, Mike Galbraith wrote:
> > > On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > > > * Jens Axboe <jens.axboe@...cle.com> wrote:
> > > > 
> > > > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > > > And here's a newer version.
> > > > > > 
> > > > > > I tinkered a bit with your proglet and finally found the 
> > > > > > problem.
> > > > > > 
> > > > > > You used a single pipe per child, this means the loop in 
> > > > > > run_child() would consume what it just wrote out until it got 
> > > > > > force preempted by the parent which would also get woken.
> > > > > > 
> > > > > > This results in the child spinning a while (its full quota) and 
> > > > > > only reporting the last timestamp to the parent.
> > > > > 
> > > > > Oh doh, that's not well thought out. Well it was a quick hack :-) 
> > > > > Thanks for the fixup, now it's at least usable to some degree.
> > > > 
> > > > What kind of latencies does it report on your box?
> > > > 
> > > > Our vanilla scheduler default latency targets are:
> > > > 
> > > >   single-core: 20 msecs
> > > >     dual-core: 40 msecs
> > > >     quad-core: 60 msecs
> > > >     opto-core: 80 msecs
> > > > 
> > > > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via 
> > > > /proc/sys/kernel/sched_latency_ns:
> > > > 
> > > >    echo 10000000 > /proc/sys/kernel/sched_latency_ns
> > > 
> > > He would also need to lower min_granularity, otherwise, it'd be larger
> > > than the whole latency target.
> > > 
> > > I'm testing right now, and one thing that is definitely a problem is the
> > > amount of sleeper fairness we're giving.  A full latency is just too
> > > much short term fairness in my testing.  While sleepers are catching up,
> > > hogs languish.  That's the biggest issue going on.
> > > 
> > > I've also been doing some timings of make -j4 (looking at idle time),
> > > and find that child_runs_first is mildly detrimental to fork/exec load,
> > > as are buddies.
> > > 
> > > I'm running with the below at the moment.  (the kthread/workqueue thing
> > > is just because I don't see any reason for it to exist, so consider it
> > > to be a waste of perfectly good math;)
> > 
> > Using latt, it seems better than -rc9. The below are entries logged
> > while running make -j128 on a 64 thread box. I did two runs on each, and
> > latt is using 8 clients.
> > 
> > -rc9
> >         Max                23772 usec
> >         Avg                 1129 usec
> >         Stdev               4328 usec
> >         Stdev mean           117 usec
> > 
> >         Max                32709 usec
> >         Avg                 1467 usec
> >         Stdev               5095 usec
> >         Stdev mean           136 usec
> > 
> > -rc9 + patch
> > 
> >         Max                11561 usec
> >         Avg                 1532 usec
> >         Stdev               1994 usec
> >         Stdev mean            48 usec
> > 
> >         Max                 9590 usec
> >         Avg                 1550 usec
> >         Stdev               2051 usec
> >         Stdev mean            50 usec
> > 
> > max latency is way down, and much smaller variation as well.
> 
> Things are much better with this patch on the notebook! I cannot compare
> with BFS as that still doesn't run anywhere I want it to run, but it's
> way better than -rc9-git stock. latt numbers on the notebook have 1/3
> the max latency, average is lower, and stddev is much smaller too.

BFS210 runs on the laptop (dual core intel core duo). With make -j4
running, I clock the following latt -c8 'sleep 10' latencies:

-rc9

        Max                17895 usec
        Avg                 8028 usec
        Stdev               5948 usec
        Stdev mean           405 usec

        Max                17896 usec
        Avg                 4951 usec
        Stdev               6278 usec
        Stdev mean           427 usec

        Max                17885 usec
        Avg                 5526 usec
        Stdev               6819 usec
        Stdev mean           464 usec

-rc9 + mike

        Max                 6061 usec
        Avg                 3797 usec
        Stdev               1726 usec
        Stdev mean           117 usec

        Max                 5122 usec
        Avg                 3958 usec
        Stdev               1697 usec
        Stdev mean           115 usec

        Max                 6691 usec
        Avg                 2130 usec
        Stdev               2165 usec
        Stdev mean           147 usec

-rc9 + bfs210

        Max                   92 usec
        Avg                   27 usec
        Stdev                 19 usec
        Stdev mean             1 usec

        Max                   80 usec
        Avg                   23 usec
        Stdev                 15 usec
        Stdev mean             1 usec

        Max                   97 usec
        Avg                   27 usec
        Stdev                 21 usec
        Stdev mean             1 usec

One thing I also noticed is that when I have logged in, I run xmodmap
manually to load some keymappings (I always tell myself to add this to
the log in scripts, but I suspend/resume this laptop for weeks at the
time and forget before the next boot). With the stock kernel, xmodmap
will halt X updates and take forever to run. With BFS, it returned
instantly. As I would expect.

So the BFS design may be lacking in the scalability end (which is
obviously true, if you look at the code), but I can understand the
appeal of the scheduler for "normal" desktop people.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/