lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180531122638.GJ12180@hirez.programming.kicks-ass.net>
Date:   Thu, 31 May 2018 14:26:38 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Waiman Long <longman@...hat.com>
Cc:     Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Ingo Molnar <mingo@...hat.com>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        kernel-team@...com, pjt@...gle.com, luto@...capital.net,
        Mike Galbraith <efault@....de>, torvalds@...ux-foundation.org,
        Roman Gushchin <guro@...com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v9 3/7] cpuset: Add cpuset.sched.load_balance flag to v2

On Tue, May 29, 2018 at 09:41:30AM -0400, Waiman Long wrote:
> The sched.load_balance flag is needed to enable CPU isolation similar to
> what can be done with the "isolcpus" kernel boot parameter. Its value
> can only be changed in a scheduling domain with no child cpusets. On
> a non-scheduling domain cpuset, the value of sched.load_balance is
> inherited from its parent. This is to make sure that all the cpusets
> within the same scheduling domain or partition has the same load
> balancing state.
> 
> This flag is set by the parent and is not delegatable.

> +  cpuset.sched.domain_root
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.
> +
> +	If set, it indicates that the current cgroup is the root of a
> +	new scheduling domain or partition that comprises itself and
> +	all its descendants except those that are scheduling domain
> +	roots themselves and their descendants.  The root cgroup is
> +	always a scheduling domain root.
> +
> +	There are constraints on where this flag can be set.  It can
> +	only be set in a cgroup if all the following conditions are true.
> +
> +	1) The "cpuset.cpus" is not empty and the list of CPUs are
> +	   exclusive, i.e. they are not shared by any of its siblings.
> +	2) The parent cgroup is also a scheduling domain root.
> +	3) There is no child cgroups with cpuset enabled.  This is
> +	   for eliminating corner cases that have to be handled if such
> +	   a condition is allowed.
> +
> +	Setting this flag will take the CPUs away from the effective
> +	CPUs of the parent cgroup.  Once it is set, this flag cannot
> +	be cleared if there are any child cgroups with cpuset enabled.
> +	Further changes made to "cpuset.cpus" is allowed as long as
> +	the first condition above is still true.
> +
> +	A parent scheduling domain root cgroup cannot distribute all
> +	its CPUs to its child scheduling domain root cgroups unless
> +	its load balancing flag is turned off.
> +
> +  cpuset.sched.load_balance
> +	A read-write single value file which exists on non-root
> +	cpuset-enabled cgroups.  It is a binary value flag that accepts
> +	either "0" (off) or "1" (on).  This flag is set by the parent
> +	and is not delegatable.  It is on by default in the root cgroup.
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.
> +
> +	The load balancing state of a cgroup can only be changed on a
> +	scheduling domain root cgroup with no cpuset-enabled children.
> +	All cgroups within a scheduling domain or partition must have
> +	the same load balancing state.	As descendant cgroups of a
> +	scheduling domain root are created, they inherit the same load
> +	balancing state of their root.

I still find all that a bit weird.

So load_balance=0 basically changes a partition into a
'fully-partitioned partition' with the seemingly random side-effect that
now sub-partitions are allowed to consume all CPUs.

The rationale, only given in the Changelog above, seems to be to allow
'easy' emulation of isolcpus.

I'm still not convinced this is a useful knob to have. You can do
fully-partitioned by simply creating a lot of 1 cpu parititions.

So this one knob does two separate things, both of which seem, to me,
redundant.

Can we please get better rationale for this?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ