lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4eddbc8f761c113fb098b81ed4c542827664abb3.camel@siemens.com>
Date: Wed, 11 Sep 2024 07:02:25 +0000
From: "MOESSBAUER, Felix" <felix.moessbauer@...mens.com>
To: "longman@...hat.com" <longman@...hat.com>, "axboe@...nel.dk"
	<axboe@...nel.dk>
CC: "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>, "Schmidt, Adriaan"
	<adriaan.schmidt@...mens.com>, "Bezdeka, Florian"
	<florian.bezdeka@...mens.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "asml.silence@...il.com"
	<asml.silence@...il.com>, "io-uring@...r.kernel.org"
	<io-uring@...r.kernel.org>, "dqminh@...udflare.com" <dqminh@...udflare.com>
Subject: Re: [PATCH v3 2/2] io_uring/io-wq: inherit cpuset of cgroup in io
 worker

On Tue, 2024-09-10 at 13:42 -0400, Waiman Long wrote:
> 
> On 9/10/24 13:11, Felix Moessbauer wrote:
> > The io worker threads are userland threads that just never exit to
> > the
> > userland. By that, they are also assigned to a cgroup (the group of
> > the
> > creating task).
> 
> The io-wq task is not actually assigned to a cgroup. To belong to a 
> cgroup, its pid has to be present to the cgroup.procs of the 
> corresponding cgroup, which is not the case here.

Hi, thanks for jumping in. As said, I'm not too familiar with the
internals of the io worker threads. Nonetheless, the kernel presents
the cgroup assignment quite consistently. This however contradicts your
statement from above. Example:

pid     tid
648460  648460  SCHED_OTHER   20  S    0  0-1  ./test/wq-aff.t
648460  648461  SCHED_OTHER   20  S    1  1    iou-sqp-648460
648460  648462  SCHED_OTHER   20  S    0  0    iou-wrk-648461

When I now check the cgroup.procs, I just see the 648460, which is
expected as this the process (with its main thread). Checking
cgroup.threads shows all three tids.

When checking the other way round, I get the same information:
$cat /proc/648460/task/648461/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope
$cat /proc/648460/task/648462/cgroup                                  
0::/user.slice/user-1000.slice/session-1.scope

Now I'm wondering if it is just presented incorrectly, or if these
tasks indeed belong to the mentioned cgroup?

> My understanding is
> that you are just restricting the CPU affinity to follow the cpuset
> of 
> the corresponding user task that creates it. The CPU affinity
> (cpumask) 
> is just one of the many resources controlled by a cgroup. That
> probably 
> needs to be clarified.

That's clear. Looking at the bigger picture, I want to ensure that the
io workers do not break out of the cgroup limits (I called it "ambient"
before, similar to the capabilites), because this breaks the isolation
assumption. In our case, we are mostly interested in not leaving the
cpuset, as we use that to perform system partitioning into realtime and
non realtime parts.

> 
> Besides cpumask, the cpuset controller also controls the node mask of
> the memory nodes allowed.

Yes, and that is especially important as some memory can be "closer" to
the IOs than others.

Best regards,
Felix

-- 
Siemens AG, Technology
Linux Expert Center


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ