linux-kernel - SCHED_DEADLINE with CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <1574202052.1931.17.camel@posteo.de>
Date:   Tue, 19 Nov 2019 23:20:52 +0100
From:   Philipp Stanner <stanner@...teo.de>
To:     linux-kernel@...r.kernel.org
Cc:     Hagen Pfeifer <hagen@...u.net>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de
Subject: SCHED_DEADLINE with CPU affinity

Hey folks,
(please put me in CC when answering, I'm not subscribed)

I'm currently working student in the embedded industry. We have a device where
we need to be able to process network data within a certain deadline. At the
same time, safety is a primary requirement; that's why we construct everything
fully redundant. Meaning: We have two network interfaces, each IRQ then bound
to one CPU core and spawn a container (systemd-nspawn, cgroups based) which in
turn is bound to the corresponding CPU (CPU affinity masked).

        Container0       Container1
   -----------------  -----------------
   |               |  |               |
   |    Proc. A    |  |   Proc. A'    |
   |    Proc. B    |  |   Proc. B'    |
   |               |  |               |
   -----------------  -----------------
          ^                  ^
          |                  |
        CPU 0              CPU 1
          |                  |
       IRQ eth0           IRQ eth1

Within each container several processes are started. Ranging from systemd
(SCHED_OTHER) till two (soft) real-time critical processes: which we want to
execute via SCHED_DEADLINE.

Now, I've worked through the manpage describing scheduling policies, and it
seems that our scenario is forbidden my the kernel.  I've done some tests with
the syscalls sched_setattr and sched_setaffinity, trying to activate
SCHED_DEADLINE while also binding to a certain core.  It fails with EINVAL or
EINBUSY, depending on the order of the syscalls.

I've read that the kernel accomplishes plausibility checks when you ask for a
new deadline task to be scheduled, and I assume this check is what prevents us
from implementing our intended architecture.

Now, the questions we're having are:

   1. Why does the kernel do this, what is the problem with scheduling with
      SCHED_DEADLINE on a certain core? In contrast, how is it handled when
      you have single core systems etc.? Why this artificial limitation?
   2. How can we possibly implement this? We don't want to use SCHED_FIFO,
      because out-of-control tasks would freeze the entire container.

SCHED_RR / SCHED_FIFO will probably be a better policy compared to
SCHED_OTHER, but SCHED_DEADLINE is exactly what we are looking for.

Cheers,
Philipp