linux-kernel - Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid partition root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YPHwG61qGDa3h6Wg@mtj.duckdns.org>
Date:   Fri, 16 Jul 2021 10:46:19 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Waiman Long <llong@...hat.com>
Cc:     Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kselftest@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>, Phil Auld <pauld@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [PATCH v2 2/6] cgroup/cpuset: Clarify the use of invalid
 partition root

Hello, Waiman.

On Fri, Jul 16, 2021 at 04:08:15PM -0400, Waiman Long wrote:
> > > I agree with you on principle. However, the reason why there are
> > > more restrictions on enabling partition is because I want to avoid
> > > forcing the users to always read back cpuset.partition.type to see
> > > if the operation succeeds instead of just getting an error from the
> > > operation. The former approach is more error prone. If you don't
> > > want changes in existing behavior, I can relax the checking and
> > > allow them to become an invalid partition if an illegal operation
> > > happens.
> > > 
> > > Also there is now another cpuset patch to extend cpu isolation to
> > > cgroup v1 [1]. I think it is better suit to the cgroup v2 partition
> > > scheme, but cgroup v1 is still quite heavily out there.
> > > 
> > > Please let me know what you want me to do and I will send out a v3
> > > version.
> > 
> > Note that the current cpuset partition implementation have implemented
> > some restrictions on when a partition can be enabled. However, I missed
> > some corner cases in the original implementation that allow certain
> > cpuset operations to make a partition invalid. I tried to plug those
> > holes in this patchset. However, if maintaining backward compatibility
> > is more important, I can leave those holes and update the documentation
> > to make sure that people check cpuset.partition.type to confirm if their
> > operation succeeds.
> 
> I just realize that partition root set the CPU_EXCLUSIVE bit. So changes to
> cpuset.cpus that break exclusivity rule is not allowed anyway. This patchset
> is just adding additional checks so that cpuset.cpus changes that break the
> partition root rules will not be allowed. I can remove those additional
> checks for this patchset and allow cpuset.cpus changes that break the
> partition root rules to make it invalid instead. However, I still want
> invalid changes to cpuset.partition.type to be disallowed.

So, I get the instinct to disallow these operations and it'd make sense if
the conditions aren't reachable otherwise. However, I'm afraid what users
eventually get is false sense of security rather than any actual guarantee.

Inconsistencies like this cause actual usability hazards - e.g. imagine a
system config script whic sets up exclusive cpuset and let's say that the
use case is fine with degraded operation when the target cores are offline
(e.g. energy save mode w/ only low power cores online). Let's say this
script runs in late stages during boot and has been reliable. However, at
some point, there are changes in boot sequence and now there's low but
non-trivial chance that the system would already be in low power state when
the script runs. Now the script will fail sporadically and the whole thing
would be pretty awkward to debug.

I'd much prefer to have an explicit interface to confirm the eventual state
and a way to monitor state transitions (without polling). An invalid state
is an inherent part of cpuset configuration. I'd much rather have that
really explicit in the interface even if that means a bit of extra work at
configuration time.

Thanks.

-- 
tejun