lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Mar 2018 09:41:20 +0100
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Waiman Long <longman@...hat.com>
Cc:     Tejun Heo <tj@...nel.org>, Li Zefan <lizefan@...wei.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        kernel-team@...com, pjt@...gle.com, luto@...capital.net,
        efault@....de, torvalds@...ux-foundation.org,
        Roman Gushchin <guro@...com>
Subject: Re: [PATCH v6 2/2] cpuset: Add cpuset.sched_load_balance to v2

Hi Waiman,

On 21/03/18 12:21, Waiman Long wrote:
> The sched_load_balance flag is needed to enable CPU isolation similar
> to what can be done with the "isolcpus" kernel boot parameter.
> 
> The sched_load_balance flag implies an implicit !cpu_exclusive as
> it doesn't make sense to have an isolated CPU being load-balanced in
> another cpuset.
> 
> For v2, this flag is hierarchical and is inherited by child cpusets. It
> is not allowed to have this flag turn off in a parent cpuset, but on
> in a child cpuset.
> 
> This flag is set by the parent and is not delegatable.
> 
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
>  Documentation/cgroup-v2.txt | 22 ++++++++++++++++++
>  kernel/cgroup/cpuset.c      | 56 +++++++++++++++++++++++++++++++++++++++------
>  2 files changed, 71 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index ed8ec66..c970bd7 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1514,6 +1514,28 @@ Cpuset Interface Files
>  	it is a subset of "cpuset.mems".  Its value will be affected
>  	by memory nodes hotplug events.
>  
> +  cpuset.sched_load_balance
> +	A read-write single value file which exists on non-root cgroups.
> +	The default is "1" (on), and the other possible value is "0"
> +	(off).
> +
> +	When it is on, tasks within this cpuset will be load-balanced
> +	by the kernel scheduler.  Tasks will be moved from CPUs with
> +	high load to other CPUs within the same cpuset with less load
> +	periodically.
> +
> +	When it is off, there will be no load balancing among CPUs on
> +	this cgroup.  Tasks will stay in the CPUs they are running on
> +	and will not be moved to other CPUs.
> +
> +	This flag is hierarchical and is inherited by child cpusets. It
> +	can be turned off only when the CPUs in this cpuset aren't
> +	listed in the cpuset.cpus of other sibling cgroups, and all
> +	the child cpusets, if present, have this flag turned off.
> +
> +	Once it is off, it cannot be turned back on as long as the
> +	parent cgroup still has this flag in the off state.
> +

I'm afraid that this will not work for SCHED_DEADLINE (at least for how
it is implemented today). As you can see in Documentation [1] the only
way a user has to perform partitioned/clustered scheduling is to create
subset of exclusive cpusets and then assign deadline tasks to them. The
other thing to take into account here is that a root_domain is created
for each exclusive set and we use such root_domain to keep information
about admitted bandwidth and speed up load balancing decisions (there is
a max heap tracking deadlines of active tasks on each root_domain).

Now, AFAIR distinct root_domain(s) are created when parent group has
sched_load_balance disabled and cpus_exclusive set (in cgroup v1 that
is). So, what we normally do is create, say, cpus_exclusive groups for
the different clusters and then disable sched_load_balance at root level
(so that each cluster gets its own root_domain). Also,
sched_load_balance is enabled in children groups (as load balancing
inside clusters is what we actually needed :).

IIUC your proposal this will not be permitted with cgroup v2 because
sched_load_balance won't be present at root level and children groups
won't be able to set sched_load_balance back to 1 if that was set to 0
in some parent. Is that true?

Look, the way things work today is most probably not perfect (just to
say one thing, we need to disable load balancing for all classes at root
level just because DEADLINE wants to set restricted affinities to his
tasks :/) and we could probably think on how to change how this all
work. So, let's first see if IIUC what you are proposing (and its
implications). :)

Best,

- Juri

[1] https://elixir.bootlin.com/linux/v4.16-rc6/source/Documentation/scheduler/sched-deadline.txt#L640

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ