linux-kernel - Re: boot cgroup questions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 12 Mar 2008 11:24:33 -0700
From:	Max Krasnyanskiy <maxk@...lcomm.com>
To:	Paul Jackson <pj@....com>
CC:	Paul Menage <menage@...gle.com>, mingo@...e.hu,
	a.p.zijlstra@...llo.nl, linux-kernel@...r.kernel.org
Subject: Re: boot cgroup questions

Paul Jackson wrote:
> Paul M wrote:
>>>  they will now have to unset it in the 'boot' set as well.
>> That can break existing userspace, so I presume PaulJ isn't in favour
>> of this change.
> 
> You're right - I don't favor it.
Hmm, I think we're mixing two different threads here.
	1. How to map irq affinity handling onto cpusets.
	2. Whether and how to create in kernel 'boot' cgroup/cpuset.
They are somewhat orthogonal imho. In a sense that no mater how we decide to 
handle irqs (even if we do not do them under cpusets at all) we may still want 
'boot' group. As I mentioned at the beginning of this thread 'boot group/set 
is basically just a convenience feature. The only difference between root/top 
  group is that 'boot' group can be dynamically resized and moved.

Ok. So the rest of the email is mostly about irqs. It'd be nice if it was in 
the other thread (cpuset: irq affinity support) but I'm ok with replying here.

> Using the 'cpus' in one or more cpusets to determine both:
>  1) which CPUs can receive an irq, and
>  2) resolving conflicts in such irq placement,
> excessively overloads the cpuset hierarchy, breaking existing
> userspace, as Paul M notes.
> If you don't have any other cpuset hierarchy you need to use, and
> so don't really otherwise care what your cpuset hierarchy is, then
> I suppose this works just fine.
I'm not sure #2 is a concern. With the latest couset irq handling patches 
conflict resolution is very simple. "irq can belong to a single cpuset at a time".

> But if you also need to use the cpuset hierarchy to define nested
> subsets of CPUs and Memory Nodes, for the purposes of controlling
> which tasks can run where (the original and still primary motivation
> for cpusets) then one can only conveniently specify those trivial
> irq configurations that happen to exactly conform with that hierarchy
> (that exactly want to make use of some of the same sets of CPUs, and
> that don't depend on the hierarchy to resolve conflicts in overlapping
> irq directives).
I do not think we need overlapping irq directives.

> Almost any non-trivial use of cpusets for both irq directivity and CPU
> and Memory placement would complicate both hierarchies, forcing
> unending confusion and breakage on the existing cpuset users.
I'm not sure what breakage you're talking about. But lets talk examples I 
guess. See below.

> Some examples:
>     Let's say I have three cpusets defining the CPU and Memory Node
>     sets in which I want to place my tasks:
> 
> 	    /dev/cpuset/A
> 	    /dev/cpuset/B
> 	    /dev/cpuset/C
> 
>     and I want a particular set of irqs to be directed to the CPUs in A
>     and B, but not C.  Well -- guess I can duplicate the irqs settings.
> 
>     But don't tell me to use a 'boot' cpuset, as in:
> 
> 	    /dev/cpuset/boot/A
> 	    /dev/cpuset/boot/B
> 	    /dev/cpuset/C
> 
>     to accomplish this, as that intrudes in the hierarchy, breaking
>     user code.
> 
>     If my irq isolation needs don't exactly partition along the
>     'cpus' settings in A, B and C, then not even duplication helps.
> 
>     If the 'irqs' in /dev/cpuset/A/Z (where Z's cpus are a proper
>     subset of A's) don't match the 'irqs' in /dev/cpuset/A, then I
>     have further confusions resulting from conflicting irq directives.
How is that any different from tasks ? Exact same example right back at you.
Suppose I have a task that needs to run in A and B but not C. In fact if you 
look at the example that I provided in the other thread I already have such an 
app. In my current apps different threads have to run in different cpusets.

And yes I think the way to solve that is to use more complex cpuset hierarchy 
like the one you used above. I would not necessarily mix in the 'boot' set 
here. I mean if people want to subdivide it that's fine but they do not have 
to. I mean people can just nuke the 'boot' group/set and create something else.

> (If your proposal handles all the above, without forcing changes
> on the cpuset hierarchy, then I misread it - in that case, sorry.)
It does not force any changes. irqs handled just like tasks and if people have 
complex partitioning requirements they may have to use more complicated 
hierarchies.

> Paul M has already proposed pulling apart the binding of CPUs and
> Memory Nodes, in the underlying cgroups, as he apparently has cases in
> which the legacy connection of those two into a single cpuset hierarchy
> is an undesired constraint on (complication of) the hierarchy.
> That's more likely the direction in which we should be proceeding --
> making these hierarchies independent, not entwining them.
> 
> This additional overloading of the current cpuset hierarchy might
> handle the simple case you need.  But that's only because you don't
> have conflicting needs for the cpuset hierarchy.
> 
> Hopefully, Paul M will be able to view with some sense of humor that I
> am complaining that this proposal of yourself (and Peter Z's earlier
> patches) isn't general enough, even as I have complained of some of
> some other recent cgroup proposals of Paul M that their increased
> generality isn't sufficient to justify their subtle incompatibilities.
> 
> At a minimum, as in my proposal (http://lkml.org/lkml/2008/3/6/512) of
> last week, one needs some mechanism independent of the cpuset hierarchy
> to resolve conflicts in these irq directives.  As you may recall,
> that proposal named each set of irqs, let each cpuset specify which
> named set of irqs applied to its CPUs, and encoded the precedence N
> of each named list of irqs in the filename '/dev/cpuset/irqs.N.name'
> of the file listing the irqs in that named set.  Then one can specify
> irqs for each cpuset, and have some way to specify the precedence of
> these irq specifications, without overloading the cpuset hierarchy.
> 
> Even this minimum proposal might be insufficient, if one has needs
> to specify irq directives for sets of CPUs that are not otherwise
> present in the cpuset hierarchy.  Observe that this proposal does
> not handle the next to the last example case above.  I am not yet
> convinced that this deficiency is a show stopper.  It might be.
That (ie additional sets of irqs) seems like an major overkill to me.
Probably because I do not think that there are any conflicts to resolve in the 
first place. As I explained above if we treat irqs just like tasks (from 
cpuset perspective) then same exact rules and limitations apply. Irq can be 
assigned to a single cpuset at a time. Complex requirements can be solved 
either by deeper cgroup/cpuset hierarchies or worst case if there is something 
  totally wacky constraint people always have an option of assigning irq to 
the top cpuset and using /proc/irq/N/smp_affinity interface to select which 
cpus it can run on.

> The other direction considered, making this its own cgroup, -seemed-
> to fail as well, as someone, I forget whom, noted.  Cgroups attach
> tasks to sets of things.  We aren't trying to attach tasks to anything.
> We're trying to attach irqs to CPUs.  We are trying now to treat irqs
> as 'pseudo-tasks', but that forces the irq hierarchy to be a subset
> of the CPU hierarchy, due to overloading the 'cpus' set.  This is the
> problem noted above.
> 
> Paul M -- could we take a different tack here -- extend cgroups to map
> -either- tasks or irqs to the managed resources?  Then irqs would be
> managed by a cgroup hierarchy that mapped irqs to a subsystem specific
> attribute of 'cpus' (resembling the cpuset 'cpus').  If the hierarchy
> one needed for irqs was a nice subset of ones cpuset hierarchy, one
> might even mount both cgroup subsystems on the same mount, so long
> as we could work out what it means for two cgroup subsystems to share
> the same subsystem specific attribute, 'cpus' in this case.
Hold on. How does this help if at the end of the day 'cpus' are still shared 
between the irq and task groups ? We'd still have exact same constrains.

btw I'm starting to gravitate back towards my original solution (ie cpu_map 
that tells which cpus can be used by kernel and irqs). If you remember I was 
totally against using cpusets/cgroup exactly because they are designed to 
handle tasks. You guys convinced me that we can extend them and that it's a 
better way to go about. Yet after weeks of discussion we seem to be taking 
about adding more and more stuff.

Anyway, I think treating irqs as tasks and enforcing the same rules and 
constraints should handle most scenarios even complex ones at the expense of 
deeper cpuset hierarchies. Adding new 'cgroups' or 'sets' just for irqs seems 
overkill.

Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/