lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7dc9cf67-b482-a723-c779-14c7598e1869@redhat.com>
Date:   Wed, 1 Nov 2023 14:14:38 -0400
From:   Waiman Long <longman@...hat.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Christian Brauner <brauner@...nel.org>,
        Jonathan Corbet <corbet@....net>,
        Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Giuseppe Scrivano <gscrivan@...hat.com>
Subject: Re: [PATCH v8 0/7] cgroup/cpuset: Support remote partitions

On 10/24/23 12:13, Michal Koutný wrote:
> On Fri, Oct 13, 2023 at 12:03:18PM -0400, Waiman Long <longman@...hat.com> wrote:
>>> [chain]
>>>     root
>>>     |                           \
>>>     mid1a                        mid1b
>>>      cpuset.cpus=0-1              cpuset.cpus=2-15
>>>      cpuset.cpus.partition=root
>>>     |
>>>     mid2
>>>      cpuset.cpus=0-1
>>>      cpuset.cpus.partition=root
>>>     |
>>>     cont
>>>      cpuset.cpus=0-1
>>>      cpuset.cpus.partition=root
>> In this case, the effective CPUs of both mid1a and mid2 will be empty. IOW,
>> you can't have any task in these 2 cpusets.
> I see, that is relevant to a threaded subtree only where the admin / app
> can know how to distribute CPUs and place threads to internal nodes.
>
>> For the remote case, you can have intermediate tasks in both mid1a and mid2
>> as long as cpuset.cpus contains more CPUs than cpuset.cpus.exclusive.
> It's obvious that cpuset.cpus.exclusive should be exclusive among
> siblings.
> Should it also be so along the vertical path?

Sorry for the late reply. I have forgot to respond earlier.

We don't support that vertical exclusive check in cgroup v1 
cpuset.cpu_exclusive.
>    root
>    |
>    mid1a
>     cpuset.cpus=0-2
>     cpuset.cpus.exclusive=0
>    |
>    mid2
>     cpuset.cpus=0-2
>     cpuset.cpus.exclusive=1
>    |
>    cont
>     cpuset.cpus=0-2
>     cpuset.cpus.exclusive=2
>     cpuset.cpus.partition=root
>
> IIUC, this should be a valid config regardless of cpuset.cpus.partition
> setting on mid1a and mid2.
> Whereas
>
>    root
>    |
>    mid1a
>     cpuset.cpus=0-2
>     cpuset.cpus.exclusive=0
>    |
>    mid2
>     cpuset.cpus=0-2
>     cpuset.cpus.exclusive=1-2
>     cpuset.cpus.partition=root
>    |
>    cont
>     cpuset.cpus=1-2
>     cpuset.cpus.exclusive=1-2
>     cpuset.cpus.partition=root
>
> Here, I'm hesitating, will mid2 have any exclusively owned cpus?
>
> (I have flashes of understading cpus.exclusive as being a more
> expressive mechanism than partitions. OTOH, it seems non-intuitive when
> both are combined, thus I'm asking to internalize it better.
> Should partitions be deprecated for simplicty? They're still good to
> provide the notification mechanism of invalidation.
> cpuset.cpus.exclusive.effective don't have that.)

Like cpuset.cpus, cpuset.cpus.exclusive follows the same hierarchical 
rule. IOW, the CPUs in cpuset.cpus.exclusive will be ignored if they are 
not present in its ancestor nodes. The value in cpuset.cpus.exclusive 
shows the intent of the users. cpuset.cpus.exclusive.effective shows the 
real exclusive CPUs when partition is enabled. So we just can't use 
cpuset.cpus.exclusive as a replacement for cpuset.cpus.partition.

As a result, we can't actually support the vertical CPU exclusion as you 
suggest above.

>
>> They will be ready eventually. This requirement of remote partition actually
>> came from our OpenShift team as the use of just local partition did not meet
>> their need. They don't need access to exclusive CPUs in the parent cgroup
>> layer for their management daemons. They do need to activate isolated
>> partition in selected child cgroups to support our Telco customers to run
>> workloads like DPDK.
>>
>> So they will add the support to upstream Kubernetes.
> Is it worth implementing anything touching (ancestral)
> cpuset.cpus.partition then?

I don't quite get what you want to ask here.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ