lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84b4ab9d1402e4308bf4738e2c53203975ab855a.camel@codethink.co.uk>
Date: Tue, 03 Jun 2025 13:18:23 +0200
From: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
To: luca abeni <luca.abeni@...tannapisa.it>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org, Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Vineeth
 Pillai	 <vineeth@...byteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
 SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)

Hi Luca

Thank you very much!

On Fri, 2025-05-30 at 11:21 +0200, luca abeni wrote:
> Hi Marcel,
> 
> On Sun, 25 May 2025 21:29:05 +0200
> Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk> wrote:
> [...]
> > > How do you configure systemd? I am having troubles in reproducing
> > > your AllowedCPUs configuration... This is an example of what I am
> > > trying: sudo systemctl set-property --runtime custom-workload.slice
> > > AllowedCPUs=1 sudo systemctl set-property --runtime init.scope
> > > AllowedCPUs=0,2,3 sudo systemctl set-property --runtime
> > > system.slice AllowedCPUs=0,2,3 sudo systemctl set-property
> > > --runtime user.slice AllowedCPUs=0,2,3 and then I try to run a
> > > SCHED_DEADLINE application with sudo systemd-run --scope -p
> > > Slice=custom-workload.slice <application>  
> > 
> > We just use a bunch of systemd configuration files as follows:
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/monitor.slice
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> [...]
> 
> So, I copied your *.slice files in /lib/systemd/system (and I added
> them to the "Wants=" entry of /lib/systemd/system/slices.target,
> otherwise the slices are not created), but I am still unable to run
> SCHED_DEADLINE applications in these slices.

We just link them there e.g.

[root@...alhost ~]# ls -l /etc/systemd/system/slices.target.wants/safety1.slice
lrwxrwxrwx 1 root root 37 Nov 10  2011 /etc/systemd/system/slices.target.wants/safety1.slice ->
/usr/lib/systemd/system/safety1.slice

BTW: /lib is just sym-linked to /usr/lib in our setup.

> This is due to the fact that the kernel does not create a new root
> domain for these cpusets (probably because the cpusets' CPUs are not
> exclusive and the cpuset is not "isolated": for example,
> /sys/fs/cgroup/safety1.slice/cpuset.cpus.partition is set to "member",
> not to "isolated").

Not sure, but for me it is indeed root e.g.

[root@...alhost ~]# cat /sys/fs/cgroup/safety1.slice/cpuset.cpus.partition
root

> So, the "cpumask_subset(span, p->cpus_ptr)" in
> sched_setsched() is still false and the syscall returns -EPERM.
> 
> 
> Since I do not know how to obtain an isolated cpuset with cgroup v2 and
> systemd, I tried using the old cgroup v1, as described in the
> SCHED_DEADLINE documentation.

I would have thought it should not make any difference whether cgroup v1 or v2 is used, but then who knows.

> This worked fine, and enabling SCHED_FLAG_RECLAIM actually reduced the
> number of missed deadlines (I tried with a set of periodic tasks having
> the same parameters as the ones you described). So, it looks like
> reclaiming is working correctly (at least, as far as I can see) when
> using cgroup v1 to configure the CPU partitions... Maybe there is some
> bug triggered by cgroup v2,

Could be, but anyway would be good to also update the SCHED_DEADLINE documentation to cgroup v2.

> or maybe I am misunderstanding your setup.

No, there should be nothing else special really.

> I think the experiment suggested by Juri can help in understanding
> where the issue can be.

Yes, I already did all that and hope you guys can get some insights from that experiment.

And remember, if I can help in any other way just let me know. Thanks!

> 			Thanks,
> 				Luca
> 
> 
> > [Unit]
> > Description=Prioritized slice for the safety monitor.
> > Before=slices.target
> > 
> > [Slice]
> > CPUWeight=1000
> > AllowedCPUs=0
> > MemoryAccounting=true
> > MemoryMin=10%
> > ManagedOOMPreference=omit
> > 
> > [Install]
> > WantedBy=slices.target
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/safety1.slice
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> > [Unit]
> > Description=Slice for Safety case processes.
> > Before=slices.target
> > 
> > [Slice]
> > CPUWeight=1000
> > AllowedCPUs=1
> > MemoryAccounting=true
> > MemoryMin=10%
> > ManagedOOMPreference=omit
> > 
> > [Install]
> > WantedBy=slices.target
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/safety2.slice
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> > [Unit]
> > Description=Slice for Safety case processes.
> > Before=slices.target
> > 
> > [Slice]
> > CPUWeight=1000
> > AllowedCPUs=2
> > MemoryAccounting=true
> > MemoryMin=10%
> > ManagedOOMPreference=omit
> > 
> > [Install]
> > WantedBy=slices.target
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/safety3.slice
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> > [Unit]
> > Description=Slice for Safety case processes.
> > Before=slices.target
> > 
> > [Slice]
> > CPUWeight=1000
> > AllowedCPUs=3
> > MemoryAccounting=true
> > MemoryMin=10%
> > ManagedOOMPreference=omit
> > 
> > [Install]
> > WantedBy=slices.target
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/system.slice 
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> > 
> > #
> > # This slice will control all processes started by systemd by
> > # default.
> > #
> > 
> > [Unit]
> > Description=System Slice
> > Documentation=man:systemd.special(7)
> > Before=slices.target
> > 
> > [Slice]
> > CPUQuota=150%
> > AllowedCPUs=0
> > MemoryAccounting=true
> > MemoryMax=80%
> > ManagedOOMSwap=kill
> > ManagedOOMMemoryPressure=kill
> > 
> > [root@...alhost ~]# cat /lib/systemd/system/user.slice 
> > # Copyright (C) 2024 Codethink Limited
> > # SPDX-License-Identifier: GPL-2.0-only
> > 
> > #
> > # This slice will control all processes started by systemd-logind
> > #
> > 
> > [Unit]
> > Description=User and Session Slice
> > Documentation=man:systemd.special(7)
> > Before=slices.target
> > 
> > [Slice]
> > CPUQuota=25%
> > AllowedCPUs=0
> > MemoryAccounting=true
> > MemoryMax=80%
> > ManagedOOMSwap=kill
> > ManagedOOMMemoryPressure=kill
> > 
> > > However, this does not work because systemd is not creating an
> > > isolated cpuset... So, the root domain still contains CPUs 0-3, and
> > > the "custom-workload.slice" cpuset only has CPU 1. Hence, the check
> > >                         /*
> > >                          * Don't allow tasks with an affinity mask
> > > smaller than
> > >                          * the entire root_domain to become
> > > SCHED_DEADLINE. We
> > >                          * will also fail if there's no bandwidth
> > > available. */
> > >                         if (!cpumask_subset(span, p->cpus_ptr) ||
> > >                             rq->rd->dl_bw.bw == 0) {
> > >                                 retval = -EPERM;
> > >                                 goto unlock;
> > >                         }
> > > in sched_setsched() fails.
> > > 
> > > 
> > > How are you configuring the cpusets?  
> > 
> > See above.
> > 
> > > Also, which kernel version are you using?
> > > (sorry if you already posted this information in previous emails
> > > and I am missing something obvious)  
> > 
> > Not even sure, whether I explicitly mentioned that other than that we
> > are always running latest stable.
> > 
> > Two months ago when we last run some extensive tests on this it was
> > actually v6.13.6.
> > 
> > > 			Thanks,  
> > 
> > Thank you!
> > 
> > > 				Luca

Cheers

Marcel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ