linux-kernel - Re: [PATCH v5 7/7] sched/fair: Fair server interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231107081602.GP8262@noisy.programming.kicks-ass.net>
Date:   Tue, 7 Nov 2023 09:16:02 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joel Fernandes <joel@...lfernandes.org>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v5 7/7] sched/fair: Fair server interface

On Mon, Nov 06, 2023 at 05:29:49PM +0100, Daniel Bristot de Oliveira wrote:
> On 11/6/23 16:40, Peter Zijlstra wrote:
> > On Sat, Nov 04, 2023 at 11:59:24AM +0100, Daniel Bristot de Oliveira wrote:
> >> Add an interface for fair server setup on debugfs.
> >>
> >> Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}:
> >>
> >>  - fair_server_runtime: set runtime in ns
> >>  - fair_server_period: set period in ns
> >>  - fair_server_defer: on/off for the defer mechanism
> >>
> > 
> > This then leaves /proc/sys/kernel/sched_rt_{period,runtime}_us to be the
> > total available bandwidth control, right?
> 
> right, but thinking aloud... given that the per-cpu files are already allocating the
> bandwidth on the dl_rq, the spare time for fair scheduler is granted.
> 
> Still, we can have them there as a safeguard to not overloading the deadline
> scheduler... (thinking aloud 2) as long as global is a thing... as we get away
> from it, that global limitation will make less sense - still better to have a form
> of limitation so people are aware of bandwidth until there.

Yeah, so having a limit on the deadline thing seems prudent as a way to
model system overhead. I mean 100% sounds nice, but then all the models
also assume no interrupts, no scheduler or migration overhead etc.. So
setting a slightly lower max seems far more realistic to me.

That said, the period/bandwidth thing is now slightly odd, as we really
only care about the utilization. But whatever. One thing at a time.

> > But then shouldn've we also rip out the throttle thingy right quick?
> > 
> 
> I was thinking about moving the entire throttling machinery inside CONFIG_RT_GROUP_SCHED
> for now, because GROUP_SCHED depends on it, no?

Yes. Until we can delete all that code we'll have to keep some of that.

> With the next step on moving the dl server as the base for the
> hierarchical scheduling...  That will rip out the
> CONFIG_RT_GROUP_SCHED... with a thing with a per-cpu interface.
> 
> Does it make sense?

I'm still not sure how to deal with affinities and deadline servers for
RT.

There's a bunch of issues and I thing we've only got some of them solved.

The semi-partitioned thing (someone was working on that, I think you
know the guy), solves DL 'entities' having affinities.

But the problem of FIFO is that they don't have inherent bandwidth. This
in turn means that any server for FIFO needs to be minimally concurrent,
otherwise you hand out bandwidth to lower priority tasks that the higher
priority task might want etc.. (Andersson's group has papers here).

Specifically, imagine a server with U=1.5 and 3 tasks, a high prio task
that requires .8 a medium prio task that requires .6 and a low prio task
that soaks up whatever it can get its little grubby paws on.

Then with minimal concurreny this works out nicely, high gets .8, mid
gets .6 and low gets the remaining .1.

If OTOH you don't limit concurrency and let them all run concurrently,
you can end up with the situation where they each get .5. Which is
obviously fail.

Add affinities here though and you're up a creek, how do you distribute
utilization between the slices, what slices, etc.. You say given them a
per-cpu cgroup interface, and have them configure it themselves, but
that's a god-aweful thing to ask userspace to do.

Ideally, I'd delete all of FIFO, it's such a horrid trainwreck, a total
and abysmal failure of a model -- thank you POSIX :-(