[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f75859e0-04d4-3da2-8df0-eb8841623a7c@redhat.com>
Date: Fri, 13 Oct 2023 12:03:18 -0400
From: Waiman Long <longman@...hat.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Christian Brauner <brauner@...nel.org>,
Jonathan Corbet <corbet@....net>,
Shuah Khan <shuah@...nel.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Giuseppe Scrivano <gscrivan@...hat.com>
Subject: Re: [PATCH v8 0/7] cgroup/cpuset: Support remote partitions
On 10/13/23 11:50, Michal Koutný wrote:
> Hello.
>
> (I know this is heading for 6.7. Still I wanted to have a look at this
> after it stabilized somehow to understand the new concept better but I
> still have some questions below.)
>
> On Tue, Sep 05, 2023 at 09:32:36AM -0400, Waiman Long <longman@...hat.com> wrote:
>> Both scheduling and isolated partitions can be formed as a remote
>> partition. A local partition can be created under a remote partition.
>> A remote partition, however, cannot be formed under a local partition
>> for now.
>>
>>
>> With this patch series, we allow the creation of remote partition
>> far from the root. The container management tool can manage the
>> "cpuset.cpus.exclusive" file without impacting the other cpuset
>> files that are managed by other middlewares. Of course, invalid
>> "cpuset.cpus.exclusive" values will be rejected.
> I take the example with a nested cgroup `cont` to which I want to
> dedicate two CPUs (0 and 1).
> IIUC, I can do this both with a chain of local root partitions or as a
> single remote partion.
>
>
> [chain]
> root
> | \
> mid1a mid1b
> cpuset.cpus=0-1 cpuset.cpus=2-15
> cpuset.cpus.partition=root
> |
> mid2
> cpuset.cpus=0-1
> cpuset.cpus.partition=root
> |
> cont
> cpuset.cpus=0-1
> cpuset.cpus.partition=root
In this case, the effective CPUs of both mid1a and mid2 will be empty.
IOW, you can't have any task in these 2 cpusets.
>
> [remote]
> root
> | \
> mid1a mid1b
> cpuset.cpus.exclusive=0-1 cpuset.cpus=2-15
> |
> mid2
> cpuset.cpus.exclusive=0-1
> |
> cont
> cpuset.cpus.exclusive=0-1
> cpuset.cpus.partition=root
>
> In the former case I must configure cpuset.cpus and
> cpuset.cpus.partition along the whole path and in the second case
> cpuset.cpus.exclusive still along the whole path and root at the bottom
> only.
>
> What is the difference between the two configs above?
> (Or can you please give an example where the remote partitions are
> better illustrated?)
For the remote case, you can have intermediate tasks in both mid1a and
mid2 as long as cpuset.cpus contains more CPUs than cpuset.cpus.exclusive.
> <snip>
>> Modern container orchestration tools like Kubernetes use the cgroup
>> hierarchy to manage different containers. And it is relying on other
>> middleware like systemd to help managing it. If a container needs to
>> use isolated CPUs, it is hard to get those with the local partitions
>> as it will require the administrative parent cgroup to be a partition
>> root too which tool like systemd may not be ready to manage.
> Such tools ready aren't ready to manage cpuset.cpus.exclusive, are they?
> IOW tools need to distinguish exclusive and "shared" CPUs which is equal
> to distinguishing root and member partitions.
They will be ready eventually. This requirement of remote partition
actually came from our OpenShift team as the use of just local partition
did not meet their need. They don't need access to exclusive CPUs in the
parent cgroup layer for their management daemons. They do need to
activate isolated partition in selected child cgroups to support our
Telco customers to run workloads like DPDK.
So they will add the support to upstream Kubernetes.
Cheers,
Longman
Powered by blists - more mailing lists