linux-kernel - Re: SCHED_DEADLINE with CPU affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200113092216.GA14325@localhost.localdomain>
Date:   Mon, 13 Jan 2020 10:22:16 +0100
From:   Juri Lelli <juri.lelli@...hat.com>
To:     Philipp Stanner <stanner@...teo.de>
Cc:     linux-kernel@...r.kernel.org, Hagen Pfeifer <hagen@...u.net>,
        mingo@...hat.com, peterz@...radead.org, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de
Subject: Re: SCHED_DEADLINE with CPU affinity

Hi,

Sorry for the delay in repling (Xmas + catching-up w/ emails).

On 24/12/19 11:03, Philipp Stanner wrote:
> On Wed, 20.11.2019, 09:50 +0100 Juri Lelli wrote:
> > Hi Philipp,
> 
> Hey Juri,
> 
> thanks so far; we indeed could make it work with exclusive CPU-sets.

Good. :-)

> On 19/11/19 23:20, Philipp Stanner wrote:
> > 
> > > from implementing our intended architecture.
> > > 
> > > Now, the questions we're having are:
> > > 
> > >    1. Why does the kernel do this, what is the problem with
> > > scheduling with
> > >       SCHED_DEADLINE on a certain core? In contrast, how is it
> > > handled when
> > >       you have single core systems etc.? Why this artificial
> > > limitation?
> > 
> > Please have also a look (you only mentioned manpage so, in case you
> > missed it) at
> > 
> > https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L667
> > 
> > and the document in general should hopefully give you the answer
> > about
> > why we need admission control and current limitations regarding
> > affinities.
> > 
> > >    2. How can we possibly implement this? We don't want to use
> > > SCHED_FIFO,
> > >       because out-of-control tasks would freeze the entire
> > > container.
> > 
> > I experimented myself a bit with this kind of setup in the past and I
> > think I made it work by pre-configuring exclusive cpusets (similarly
> > as
> > what detailed in the doc above) and then starting containers inside
> > such
> > exclusive sets with podman run --cgroup-parent option.
> > 
> > I don't have proper instructions yet for how to do this (plan to put
> > them together soon-ish), but please see if you can make it work with
> > this hint.
> 
> I fear I have not understood quite well yet why this
> "workaround" leads to (presumably) the same results as set_affinity
> would. From what I have read, I understand it as follows: For
> sched_dead, admission control tries to guarantee that the requested
> policy can be executed. To do so, it analyzes the current workload
> situation, taking especially the number of cores into account.
> 
> Now, with a pre-configured set, the kernel knows which tasks will run
> on which core, therefore it's able to judge wether a process can be
> deadline scheduled or not. But when using the default way, you could
> start your processes as SCHED_OTHER, set SCHED_DEADLINE as policy and
> later many of them could suddenly call set_affinity, desiring to run on
> the same core, therefore provoking collisions.

But setting affinity would still have to pass admission control, and
should fail in the case you are describing (IIUC).

https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5433

Best,

Juri