linux-kernel - Re: [ANNOUNCE] LinSched for v3.3-rc7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhKKr-W+nh9Xzmwz83MUpzPuBfkj_TxjiZ+8NA5hndDUQB7bA@mail.gmail.com>
Date:	Wed, 14 Mar 2012 21:08:17 -0700
From:	Dhaval Giani <dhaval.giani@...il.com>
To:	Paul Turner <pjt@...gle.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Peter Zijlstra <peterz@...radead.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Benjamin Segall <bsegall@...gle.com>,
	Ranjit Manomohan <ranjitm@...gle.com>,
	Nikhil Rao <ncrao@...gle.com>, jmc@...unc.edu,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Abhishek Srivastava <a.srivastava.800@...il.com>
Subject: Re: [ANNOUNCE] LinSched for v3.3-rc7

[Adding abhishek to the cc]

On Wed, Mar 14, 2012 at 8:58 PM, Paul Turner <pjt@...gle.com> wrote:
> Hi All,
>
> [ Take 2, gmail tried to a non text/plain component into the last email .. ]
>
> Quick start version:
>
> Available under linsched-alpha at:
>  git://git.kernel.org/pub/scm/linux/kernel/git/pjt/linsched.git  .linsched
>
> NOTE: The branch history is still subject to some revision as I am
> still re-partitioning some of the patches.  Once this is complete, I
> will promote linsched-alpha into a linsched branch at which point it
> will no longer be subject to history re-writes.
>
> After checking out the code:
> cd tools/linsched
> make
> cd tests
> ./run_tests.sh basic_tests
> << then try changing some scheduler parameters, e.g. sched_latency,
> and repeating >>
>
> (Note:  The basic_tests are unit-tests, these are calibrated to the
> current scheduler tunables and should strictly be considered sanity
> tests.  Please see the mcarlo-sim work for a more useful testing
> environment.)
>
> Extended version:
>
> First of all, apologies in the delay to posting this -- I know there's
> been a lot of interest.  We made the choice to first rebase to v3.3
> since there were fairly extensive changes, especially within the
> scheduler, that meant we had the opportunity to significantly clean up
> some of the LinSched code.  (For example, previously we were
> processing kernel/sched* using awk as a Makefile step so that we could
> extract the necessary structure information without modifying
> sched.c!)  While the code benefited greatly from this, there were
> several other changes that required fairly extensive modification in
> this process (and in the meanwhile the v3.1 version became less
> representative due to the extent of the above changes); which pushed
> things out much further than I would have liked.  I suppose the moral
> of the story is always release early, and often.
>
> That said, I'm relatively happy with the current state of integration,
> there's certainly some specific areas that can still be greatly
> improved (in particular, the main simulator loop has not had as much
> attention paid as the LinSched<>Kernel interactions and there's a long
> list of TODOs that could be improved there), but things are now mated
> fairly cleanly through the use of a new LinSched architecture.  This
> is a total re-write of almost all LinSched<>Kernel interactions versus
> the previous (2.6.35) version, and has allowed us to now carry almost
> zero modifications against the kernel source.  It's both possible to
> develop/test in place, as well as being patch compatible.  The
> remaining touch-points now total just 20 lines!  Half of these are
> likely mergable, with the other 10 lines being more LinSched specific
> at this point in time, I've broken these down below:
>
> The total damage:
>  include/linux/init.h      |    6 ++++++   (linsched ugliness,
> unfortunately necessary until we boot-strap proper initcall support)
>  include/linux/rcupdate.h  |    3 +++    (only necessary to allow -O0
> compilation which is extremely handy for analyzing the scheduler using
> gdb)
>  kernel/pid.c              |    4 ++++        (linsched ugliness,
> these can go eventually)
>  kernel/sched/fair.c       |    2 +-          (this is just the
> promotion of 1 structure and function from static state which weren't
> published in the sched/ re-factoring that we need from within the
> simulator)
>  kernel/sched/stats.c      |    2 +-
>  kernel/time/timekeeping.c |    3 ++-    (this fixes a time-dilation
> error due to rounding when our clock-source has ns-resolution, e.g.
> shift==1)
>  6 files changed, 17 insertions(+), 3 deletions(-)
>
> Summarized changes vs 2.6.35 (previous version):
>
> - The original LinSched (understandably) simplified many of the kernel
> interactions in order to make simulation easier.  Unfortunately, this
> has serious side-effects on the accuracy of simulation.  We've now
> introduced a large portion of this state, including: irq and soft-irq
> contexts (we now perform periodic load-balance out of SCHED_SOFTIRQ
> for example), support for active load-balancing, correctly modeled
> nohz interactions, ipi and stop-task support.
>
> - Support for record and replay of application scheduling via perf.
> This is not yet well integrated, but under tests exist the tools to
> record an applications behavior using perf sched record, and then play
> it back in the simulator.
>
> - Load-balancer scoring.  This one is a very promising new avenue for
> load-balancer testing.  We analyzed several workloads and found that
> they could be well-modeled using a log-normal distribution.
> Parameterizing these models then allows us to construct a large (500)
> test-case set of randomly generated workloads that behave similarly.
> By integrating the variance between the current load-balance and an
> offline computed (currently greedy first-fit) balance we're able to
> automatically identify and score an approximation of our distance from
> an ideal load-balance.  Historically, such scores are very difficult
> to interpret, however, that's where our ability to generate a large
> set of test-cases above comes in.  This allows us to exploit a nice
> property, it's much easier to design a scoring function that diverges
> (in this case the variance) than a nice stable one that converges.  We
> can then catch regressions in load-balancer quality by measuring the
> divergence in this set of scoring functions across our set of
> test-cases.  This particular feature needs a large set of
> documentation in itself (todo), but to get started with playing with
> it see Makefile.mcarlo-sims in tools/linsched/tests.  In particular to
> evaluate the entire set across a variety of topologies the following
> command can be issued:
>  make -j <num_cpus * 2 > -f Makefile.mcarlo-sims
> (The included 'diff-mcarlo-500' tool can then be used to make
> comparisons across result sets.)
>
> - Validation versus real hardware.  Under tests/validation we've
> included a tool for replaying and recording the above simulations on a
> live-machine.  These can then be compared to simulated runs using the
> tools above to ensure that LinSched is modelling your architecture
> reasonably appropriately.  We did some reasonably extensive
> comparisons versus several x86 topologies in the v3.1 code using this;
> it's a fundamentally hard problem -- in particular there's much more
> clock drift between events on real hardware, but the results showed
> the included topologies to be a reasonable simulacrum under LinSched.
>
> What's to come?
> - More documentation, especially about the use of the new
> load-balancer scoring tools.
> - The history is very coarse right now as a result of going through a
> rebase cement-mixer.  I'd like to incrementally refactor some of the
> larger commits; once this is done I will promote linsched-alpha to a
> stable linsched branch that won't be subject to history re-writes.
> - KBuild integration.  We currently build everything out of the
> tools/linsched makefiles.  One of the immediate TODOs involves
> re-working the arch/linsched half of this to work with kbuild so that
> its less hacky/fragile.
> - Writing up some of the existing TODOs as starting points for anyone
> who wants to get involved.
>
> I'd also like to take a moment to specially recognize the effort of
> the following contributors, all of whom were involved extensively in
> the work above.  Things have come a long way since the 5000 lines of
> "#ifdef LINSCHED", the current status would not be possible without
> them.
>  Ben Segall, Dhaval Giani, Ranjit Manomohan, Nikhil Rao, and Abhishek
> Srivastava
>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/