lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <479F1118.BA47.005A.0@novell.com>
Date:	Tue, 29 Jan 2008 09:42:16 -0700
From:	"Gregory Haskins" <ghaskins@...ell.com>
To:	"Paul Jackson" <pj@....com>
Cc:	<a.p.zijlstra@...llo.nl>, <mingo@...e.hu>,
	<dmitry.adamushko@...il.com>, <rostedt@...dmis.org>,
	<menage@...gle.com>, <rientjes@...gle.com>, <tong.n.li@...el.com>,
	<tglx@...utronix.de>, <akpm@...ux-foundation.org>,
	<dhaval@...ux.vnet.ibm.com>, <vatsa@...ux.vnet.ibm.com>,
	<sgrubb@...hat.com>, <linux-kernel@...r.kernel.org>,
	<ebiederm@...ssion.com>, <nickpiggin@...oo.com.au>
Subject: Re: scheduler scalability - cgroups, cpusets and
	load-balancing

>>> On Tue, Jan 29, 2008 at 11:28 AM, in message
<20080129102836.be614579.pj@....com>, Paul Jackson <pj@....com> wrote: 
> Gregory wrote:
>>   I am a bit confused as to why you disable load-balancing in the
>>   RT cpuset?  It shouldn't be strictly necessary in order for the
>>   RT scheduler to do its job (unless I am misunderstanding what you
>>   are trying to accomplish?).  Do you do this because you *have*
>>   to in order to make real-time deadlines, or because its just a
>>   further optimization?
> 
> My primary motivation for cpusets originally, and for the
> sched_load_balance flag now, was not realtime, but "soft partitioning"
> of big NUMA systems, especially for batch schedulers.  They sometimes
> have large cpusets which are only being used to hold smaller, per-job,
> cpusets.  It is a waste of time (CPU cycles in the kernel sched code)
> to load balance those large cpusets.  Load balancing doesn't scale
> easily to high CPU counts, and it's nice to avoid doing that where
> not needed.

Understood, and that makes tons of sense.

> 
> See the following lkml message for a fuller explanation:
> 
>   http://lkml.org/lkml/2008/1/29/85
> 
> As a secondary motivation, I thought that disabling load balancing on
> the RT cpuset was the right thing to do for RT needs, but I make no
> claim to knowing much about RT.

Well, I make no claim to understand the large batch systems you work on either ;)  Everything you said made a ton of sense other than the RT/load-balance thing, but I think we are on the same page now.

> 
> I just now realized that you added a 'root_domain' in a patch in
> late Nov and early Dec.   I was on the road then, moving from
> California to Texas, and not paying much attention to Linux.

np (though I was wondering why you had no comment before ;)

> 
> A couple of questions on that patch, both involving a comment it adds
> to kernel/sched.c:
> 
> /*
>  * We add the notion of a root-domain which will be used to define per-domain
>  * variables. Each exclusive cpuset essentially defines an island domain by
>  * fully partitioning the member cpus from any other cpuset. Whenever a new
>  * exclusive cpuset is created, we also create and attach a new root-domain
>  * object.
>  */
> 
> 1) What are 'per-domain' variables?

s/per-domain/per-root-domain

> 
> 2) The mention of 'exclusive cpuset' is no longer correct.
> 
>    With the patch 'remove sched domain hooks from cpusets' cpusets
>    no longer defines sched domains using the cpu_exclusive flag.
> 
>    With the subsequent sched_load_balance patch (see
>    http://lkml.org/lkml/2007/10/6/19) cpusets uses a new per-cpuset
>    flag 'sched_load_balance' to define sched domains.

Doh!  Thanks for the heads up.  

> 
> The following revised comment might be more accurate:
> 
> /*
>  * We add the notion of a root-domain which will be used to define per-domain
>  * variables.  Each non-overlapping sched domain defines an island domain by
>  * fully partitioning the member cpus from any other cpuset. Whenever a new
>  * such a sched domain is created, we also create and attach a new 
> root-domain
>  * object.  These non-overlapping sched domains are determined by the cpuset
>  * configuration, via a call to partition_sched_domains().
>  */
> 
> It sounds like you (Gregory, others) want your RT CPUs to be in a sched
> domain, unlike the current way things are, where my cpuset code
> carefully avoids setting up a sched domain for those CPUs.  However I
> still have need, in the batch scheduler case explained above, to have
> some CPUs not in any sched domain.
> 
> If you require these RT sched domains to be setup differently somehow,
> in some way that is visible to partition_sched_domains, then that
> apparently means we need a per-cpuset flag to mark those RT cpusets.

I think we only need a plain-vanilla partition, so no flags should be necessary.

-Greg

> 
> If you just want an ordinary sched domain setup (just so long as it
> contains only the intended RT CPUs, not others) then I guess we don't
> technically need any more per-cpuset flags, but I'm worried, because
> the API we're presenting to users for this has just gone from subtle to
> bizarre.  I suspect I'll want to add a flag anyway, if by doing so, I
> can make the kernel-user API, via cpusets, easier to understand.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ