lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070409175143.GV2986@holomorphy.com>
Date:	Mon, 9 Apr 2007 10:51:43 -0700
From:	William Lee Irwin III <wli@...omorphy.com>
To:	Mike Galbraith <efault@....de>
Cc:	Ingo Molnar <mingo@...e.hu>, Gene Heskett <gene.heskett@...il.com>,
	linux-kernel@...r.kernel.org, Con Kolivas <kernel@...ivas.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	ck list <ck@....kolivas.org>
Subject: Re: Ten percent test

On Sat, 2007-04-07 at 20:08 +0200, Ingo Molnar wrote:
>> not many - and i dont think Mike tested any of these - Mike tested 
>> pretty low make -j values (Mike, can you confirm?).

On Sat, Apr 07, 2007 at 09:14:21PM +0200, Mike Galbraith wrote:
> Yes.  I don't test anything more than make -j5 when looking at
> interactivity, and make -j nr_cpus+1 is my must have yardstick.

I strongly suggest assembling a battery of cleanly and properly written,
configurable testcases, and scripting a series of regression tests as
opposed to just randomly running kernel compiles and relying on Braille.
For instance, a program that spawns a set of tasks with some spectrum
of interactive vs. noninteractive behaviors and maybe priorities too
according to command-line flags and then measures and reports the
distribution of CPU bandwidth between them, with some notion of success
or failure and performance within the realm of success reported would
be something to include in such a battery of testcases. Different sorts
of cooperating processes attempting to defeat whatever sorts of
guarantees the scheduler is intended to provide would also be good
testcases, particularly if they're arranged so as to automatically
report success or failure in their attempts to defeat the scheduler
(which even irman2.c, while quite good otherwise, fails to do).

IMHO the failure of these threads to converge to some clear conclusion
is in part due to the lack of an agreed-upon set of standards for what
the scheduler should achieve and overreliance on subjective criteria.
The testcase code going around is also somewhat embarrassing.

>From the point of view of someone wondering what these schedulers solve,
how any of this is to be demonstrated, and what the status of various
pathological cases are, these threads are a nightmare of subjective
squishiness and a tug-of-war between testcases only ever considered one
at a time needing Lindent to read that furthermore have all their
parameters hardcoded. Scripting edits and recompiles is awkward. Just
finding the testcases is also awkward; con has a collection of a few,
but they've got the aforementioned flaws and others also go around
that can only be dredged up from mailing list archive searches, plus
there's nothing like LTP where they can be run in a script with
pass/fail reports and/or performance metrics for each. One patch goes
through for one testcase and regressions against the others are open
questions.

Scheduling does have a strong subjective component, but this is too
disorganized to be allowed to pass without comment. Some minimum bar
must be set for schedulers to pass before they're considered correct.
Some method of regression testing must be arranged. And the code to
do such testing should not be complete crap with hardcoded parameters.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ