lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c91a117401225290fbf0390f2ce78c3e0fb3b2d5.camel@codethink.co.uk>
Date: Sun, 25 May 2025 21:29:05 +0200
From: Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk>
To: luca abeni <luca.abeni@...tannapisa.it>
Cc: Juri Lelli <juri.lelli@...hat.com>, linux-kernel@...r.kernel.org, Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Vineeth
 Pillai	 <vineeth@...byteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with
 SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)

Hi Luca

On Fri, 2025-05-23 at 21:46 +0200, luca abeni wrote:
> Hi Marcel,
> 
> sorry, but I have some additional questions to fully understand your
> setup...

No Problem, I am happy to answer any questions :)

> On Mon, 19 May 2025 15:32:27 +0200
> Marcel Ziswiler <marcel.ziswiler@...ethink.co.uk> wrote:
> [...]
> > > just a quick question to better understand your setup (and check
> > > where the issue comes from):
> > > in the email below, you say that tasks are statically assigned to
> > > cores; how did you do this? Did you use isolated cpusets,  
> > 
> > Yes, we use the cpuset controller from the cgroup-v2 APIs in the
> > linux kernel in order to partition CPUs and memory nodes. In detail,
> > we use the AllowedCPUs and AllowedMemoryNodes in systemd's slice
> > configurations.
> 
> How do you configure systemd? I am having troubles in reproducing your
> AllowedCPUs configuration... This is an example of what I am trying:
> 	sudo systemctl set-property --runtime custom-workload.slice AllowedCPUs=1
> 	sudo systemctl set-property --runtime init.scope AllowedCPUs=0,2,3
> 	sudo systemctl set-property --runtime system.slice AllowedCPUs=0,2,3
> 	sudo systemctl set-property --runtime user.slice AllowedCPUs=0,2,3
> and then I try to run a SCHED_DEADLINE application with
> 	sudo systemd-run --scope -p Slice=custom-workload.slice <application>

We just use a bunch of systemd configuration files as follows:

[root@...alhost ~]# cat /lib/systemd/system/monitor.slice
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
[Unit]
Description=Prioritized slice for the safety monitor.
Before=slices.target

[Slice]
CPUWeight=1000
AllowedCPUs=0
MemoryAccounting=true
MemoryMin=10%
ManagedOOMPreference=omit

[Install]
WantedBy=slices.target

[root@...alhost ~]# cat /lib/systemd/system/safety1.slice
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
[Unit]
Description=Slice for Safety case processes.
Before=slices.target

[Slice]
CPUWeight=1000
AllowedCPUs=1
MemoryAccounting=true
MemoryMin=10%
ManagedOOMPreference=omit

[Install]
WantedBy=slices.target

[root@...alhost ~]# cat /lib/systemd/system/safety2.slice
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
[Unit]
Description=Slice for Safety case processes.
Before=slices.target

[Slice]
CPUWeight=1000
AllowedCPUs=2
MemoryAccounting=true
MemoryMin=10%
ManagedOOMPreference=omit

[Install]
WantedBy=slices.target

[root@...alhost ~]# cat /lib/systemd/system/safety3.slice
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only
[Unit]
Description=Slice for Safety case processes.
Before=slices.target

[Slice]
CPUWeight=1000
AllowedCPUs=3
MemoryAccounting=true
MemoryMin=10%
ManagedOOMPreference=omit

[Install]
WantedBy=slices.target

[root@...alhost ~]# cat /lib/systemd/system/system.slice 
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only

#
# This slice will control all processes started by systemd by
# default.
#

[Unit]
Description=System Slice
Documentation=man:systemd.special(7)
Before=slices.target

[Slice]
CPUQuota=150%
AllowedCPUs=0
MemoryAccounting=true
MemoryMax=80%
ManagedOOMSwap=kill
ManagedOOMMemoryPressure=kill

[root@...alhost ~]# cat /lib/systemd/system/user.slice 
# Copyright (C) 2024 Codethink Limited
# SPDX-License-Identifier: GPL-2.0-only

#
# This slice will control all processes started by systemd-logind
#

[Unit]
Description=User and Session Slice
Documentation=man:systemd.special(7)
Before=slices.target

[Slice]
CPUQuota=25%
AllowedCPUs=0
MemoryAccounting=true
MemoryMax=80%
ManagedOOMSwap=kill
ManagedOOMMemoryPressure=kill

> However, this does not work because systemd is not creating an isolated
> cpuset... So, the root domain still contains CPUs 0-3, and the
> "custom-workload.slice" cpuset only has CPU 1. Hence, the check
>                         /*
>                          * Don't allow tasks with an affinity mask smaller than
>                          * the entire root_domain to become SCHED_DEADLINE. We
>                          * will also fail if there's no bandwidth available.
>                          */
>                         if (!cpumask_subset(span, p->cpus_ptr) ||
>                             rq->rd->dl_bw.bw == 0) {
>                                 retval = -EPERM;
>                                 goto unlock;
>                         }
> in sched_setsched() fails.
> 
> 
> How are you configuring the cpusets?

See above.

> Also, which kernel version are you using?
> (sorry if you already posted this information in previous emails and I am
> missing something obvious)

Not even sure, whether I explicitly mentioned that other than that we are always running latest stable.

Two months ago when we last run some extensive tests on this it was actually v6.13.6.

> 			Thanks,

Thank you!

> 				Luca

Cheers

Marcel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ