linux-kernel - Re: [PATCH] sched: Relax a restriction in sched_rt_can

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 5 May 2015 10:41:04 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Zefan Li <lizefan@...wei.com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Cgroups <cgroups@...r.kernel.org>
Subject: Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

Hello, Peter.

On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
> > I just realized we allow removing/adding controllers from/to cgroups
> > while there are tasks in them, which isn't safe unless we eliminate all
> > can_attach callbacks. We've done so for some cgroup subsystems, but
> > there are still a few of them...
> 
> You can't remove can_attach(), we must be able to disallow joining a
> cgroup.
> 
> If that results in you not being able to change the cgroup setup with
> tasks in, so be it -- that seems like a sane restriction anyhow.

This is really an interface policy issue.  For all other controllers,
it's almost trivial to let organizational operations (setting up
hierarchies, moving processes around) overrule controller
configurations.  The main benefit of doing this is that this decouples
organizational operations from resource control.  Users can depend on
the fact that allowed organizational operations won't fail due to
specific controller configuration issues.

This also works well with controllers accepting target configurations
regardless of the current state and enforcing rules to converge to the
configured state instead.  e.g. if you set max memory lower than the
currently used, the config will be accepted and the controller will
keep trying to make the current state converge to the target state.
This is important as rejecting configuration can lead to chasing game
between configuration attempts and run-away resource consumption.

Now, RR slices are the special case here because it's inherently
different from every other resource cgroup is concerned with.  It
simply doesn't fit into the same model that other resources follow.
There are several options we can try.

1. Decouple RR slices from cpu controller.  This would be the best
   route to follow.  RR slices need a hard allocator no matter what we
   do.  There isn't much point in imposing hierarchical structure on
   top of it.

2. Implement special case behavior so that it can follow the same
   model.  e.g. resetting RR scheduling config when the effective cpu
   cgroup changes or carrying the amount of slice being consumed with
   the process being moved.  No matter how this is done, it's gonna be
   a clear compromise as we're forcing this into the model which
   doesn't quite fit it.  That said, given how RR slices are a special
   case to begin with, I think this can be acceptable.

3. Take compromise in the other direction - add exceptions to
   organizational operations but clearly limit the failure modes.  We
   prolly want to structure code in a way to enforce this.

4. If #1 can be done in time but not right now, simply disallow any
   RR/FIFO in !root cgroups on the unified hierarchy for now.

What do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/