linux-kernel - Re: SCHED_DEADLINE with CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191120085024.GB23227@localhost.localdomain>
Date:   Wed, 20 Nov 2019 09:50:24 +0100
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Philipp Stanner <stanner@...teo.de>
Cc:     linux-kernel@...r.kernel.org, Hagen Pfeifer <hagen@...u.net>,
        mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de
Subject: Re: SCHED_DEADLINE with CPU affinity

Hi Philipp,

On 19/11/19 23:20, Philipp Stanner wrote:
> Hey folks,
> (please put me in CC when answering, I'm not subscribed)
> 
> I'm currently working student in the embedded industry. We have a device where
> we need to be able to process network data within a certain deadline. At the
> same time, safety is a primary requirement; that's why we construct everything
> fully redundant. Meaning: We have two network interfaces, each IRQ then bound
> to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
> turn is bound to the corresponding CPU (CPU affinity masked).
> 
>         Container0       Container1
>    -----------------  -----------------
>    |               |  |               |
>    |    Proc. A    |  |   Proc. A'    |
>    |    Proc. B    |  |   Proc. B'    |
>    |               |  |               |
>    -----------------  -----------------
>           ^                  ^
>           |                  |
>         CPU 0              CPU 1
>           |                  |
>        IRQ eth0           IRQ eth1
> 
> 
> Within each container several processes are started. Ranging from systemd
> (SCHED_OTHER) till two (soft) real-time critical processes: which we want to
> execute via SCHED_DEADLINE.
> 
> Now, I've worked through the manpage describing scheduling policies, and it
> seems that our scenario is forbidden my the kernel.  I've done some tests with
> the syscalls sched_setattr and sched_setaffinity, trying to activate
> SCHED_DEADLINE while also binding to a certain core.  It fails with EINVAL or
> EINBUSY, depending on the order of the syscalls.
> 
> I've read that the kernel accomplishes plausibility checks when you ask for a

Yeah, admission control.

> new deadline task to be scheduled, and I assume this check is what prevents us
> from implementing our intended architecture.
> 
> Now, the questions we're having are:
> 
>    1. Why does the kernel do this, what is the problem with scheduling with
>       SCHED_DEADLINE on a certain core? In contrast, how is it handled when
>       you have single core systems etc.? Why this artificial limitation?

Please have also a look (you only mentioned manpage so, in case you
missed it) at

https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667

and the document in general should hopefully give you the answer about
why we need admission control and current limitations regarding
affinities.

>    2. How can we possibly implement this? We don't want to use SCHED_FIFO,
>       because out-of-control tasks would freeze the entire container.

I experimented myself a bit with this kind of setup in the past and I
think I made it work by pre-configuring exclusive cpusets (similarly as
what detailed in the doc above) and then starting containers inside such
exclusive sets with podman run --cgroup-parent option.

I don't have proper instructions yet for how to do this (plan to put
them together soon-ish), but please see if you can make it work with
this hint.

Best,

Juri