linux-kernel - Re: [PATCH 0/9] sched: Core scheduling interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YGyHknFJhHO99e5a@slm.duckdns.org>
Date:   Tue, 6 Apr 2021 12:08:50 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     joel@...lfernandes.org, chris.hyser@...cle.com, joshdon@...gle.com,
        mingo@...nel.org, vincent.guittot@...aro.org,
        valentin.schneider@....com, mgorman@...e.de,
        linux-kernel@...r.kernel.org, tglx@...utronix.de,
        Michal Koutný <mkoutny@...e.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Zefan Li <lizefan.x@...edance.com>
Subject: Re: [PATCH 0/9] sched: Core scheduling interfaces

Hello,

On Tue, Apr 06, 2021 at 05:32:04PM +0200, Peter Zijlstra wrote:
> > I find it difficult to like the proposed interface from the name (the term
> > "core" is really confusing given how the word tends to be used internally)
> > to the semantics (it isn't like anything else) and even the functionality
> > (we're gonna have fixed processors at some point, right?).
> 
> Core is the topological name for the thing that hosts the SMT threads.
> Can't really help that.

I find the name pretty unfortunate given how overloaded the term is
generally and also in kernel but oh well...

> > Here are some preliminary thoughts:
> > 
> > * Are both prctl and cgroup based interfaces really necessary? I could be
> >   being naive but given that we're (hopefully) working around hardware
> >   deficiencies which will go away in time, I think there's a strong case for
> >   minimizing at least the interface to the bare minimum.
> 
> I'm not one for cgroups much, so I'll let others argue that case, except
> that per systemd and all the other new fangled shit, people seem to use
> cgroups a lot to group tasks. So it makes sense to also expose this
> through cgroups in some form.

All the new fangled things follow a certain usage pattern which makes
aligning parts of process tree with cgroup layout trivial, so when
restrictions can be applied along the process tree like this and there isn't
a particular need for dynamic hierarchical control, there isn't much need
for direct cgroup interface.

> That said; I've had requests from lots of non security folks about this
> feature to help mitigate the SMT interference.
> 
> Consider for example Real-Time. If you have an active SMT sibling, the
> CPU performance is much less than it would be when the SMT sibling is
> idle. Therefore, for the benefit of determinism, it would be very nice
> if RT tasks could force-idle their SMT siblings, and voila, this
> interface allows exactly that.
> 
> The same is true for other workloads that care about interference.

I see.

> >   Given how cgroups are set up (membership operations happening only for
> >   seeding, especially with the new clone interface), it isn't too difficult
> >   to synchronize process tree and cgroup hierarchy where it matters - ie.
> >   given the right per-process level interface, restricting configuration for
> >   a cgroup sub-hierarchy may not need any cgroup involvement at all. This
> >   also nicely gets rid of the interaction between prctl and cgroup bits.
> 
> I've no idea what you mean :/ The way I use cgroups (when I have to, for
> testing) is to echo the pid into /cgroup/foo/tasks. No clone or anything
> involved.

The usage pattern is creating a new cgroup, seeding it with a process
(either writing to tasks or using CLONE_INTO_CGROUP) and let it continue
only on that sub-hierarchy, so cgroup hierarchy usually partially overlays
process trees.

> None of my test machines come up with cgroupfs mounted, and any and all
> cgroup setup is under my control.
>
> > * If we *have* to have cgroup interface, I wonder whether this would fit a
> >   lot better as a part of cpuset. If you squint just right, this can be
> >   viewed as some dynamic form of cpuset. Implementation-wise, it probably
> >   won't integrate with the rest but I think the feature will be less jarring
> >   as a part of cpuset, which already is a bit of kitchensink anyway.
> 
> Not sure I agree, we do not change the affinity of things, we only
> control who's allowed to run concurrently on SMT siblings. There could
> be a cpuset partition split between the siblings and it would still work
> fine.

I see. Yeah, if we really need it, I'm not sure it fits in cgroup interface
proper. As I wrote elsewhere, these things are usually implemented on the
originating subsystem interface with cgroup ID as a parameter.

Thanks.

-- 
tejun