linux-kernel - Re: [PATCH v5 7/7] sched/fair: Fair server interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <23bab6d8-9256-49d2-b6d2-ac344df925ae@kernel.org>
Date:   Tue, 7 Nov 2023 15:06:51 +0100
From:   Daniel Bristot de Oliveira <bristot@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joel Fernandes <joel@...lfernandes.org>,
        Vineeth Pillai <vineeth@...byteword.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH v5 7/7] sched/fair: Fair server interface

On 11/7/23 09:16, Peter Zijlstra wrote:
> On Mon, Nov 06, 2023 at 05:29:49PM +0100, Daniel Bristot de Oliveira wrote:
>> On 11/6/23 16:40, Peter Zijlstra wrote:
>>> On Sat, Nov 04, 2023 at 11:59:24AM +0100, Daniel Bristot de Oliveira wrote:
>>>> Add an interface for fair server setup on debugfs.
>>>>
>>>> Each rq have three files under /sys/kernel/debug/sched/rq/CPU{ID}:
>>>>
>>>>  - fair_server_runtime: set runtime in ns
>>>>  - fair_server_period: set period in ns
>>>>  - fair_server_defer: on/off for the defer mechanism
>>>>
>>>
>>> This then leaves /proc/sys/kernel/sched_rt_{period,runtime}_us to be the
>>> total available bandwidth control, right?
>>
>> right, but thinking aloud... given that the per-cpu files are already allocating the
>> bandwidth on the dl_rq, the spare time for fair scheduler is granted.
>>
>> Still, we can have them there as a safeguard to not overloading the deadline
>> scheduler... (thinking aloud 2) as long as global is a thing... as we get away
>> from it, that global limitation will make less sense - still better to have a form
>> of limitation so people are aware of bandwidth until there.
> 
> Yeah, so having a limit on the deadline thing seems prudent as a way to
> model system overhead. I mean 100% sounds nice, but then all the models
> also assume no interrupts, no scheduler or migration overhead etc.. So
> setting a slightly lower max seems far more realistic to me.
> 
> That said, the period/bandwidth thing is now slightly odd, as we really
> only care about the utilization. But whatever. One thing at a time.

Yep, that is why I am mentioning the generalization as a second phase, it is
a harder problem... But having the rt throttling out of the default way is
already a good step.

> 
>>> But then shouldn've we also rip out the throttle thingy right quick?
>>>
>>
>> I was thinking about moving the entire throttling machinery inside CONFIG_RT_GROUP_SCHED
>> for now, because GROUP_SCHED depends on it, no?
> 
> Yes. Until we can delete all that code we'll have to keep some of that.
> 
>> With the next step on moving the dl server as the base for the
>> hierarchical scheduling...  That will rip out the
>> CONFIG_RT_GROUP_SCHED... with a thing with a per-cpu interface.
>>
>> Does it make sense?
> 
> I'm still not sure how to deal with affinities and deadline servers for
> RT.
> 
> There's a bunch of issues and I thing we've only got some of them solved.
> 
> The semi-partitioned thing (someone was working on that, I think you
> know the guy), solves DL 'entities' having affinities.

Yep, then having arbitrari affinities is another step towards mode flexible models...

> But the problem of FIFO is that they don't have inherent bandwidth. This
> in turn means that any server for FIFO needs to be minimally concurrent,
> otherwise you hand out bandwidth to lower priority tasks that the higher
> priority task might want etc.. (Andersson's group has papers here).
> 
> Specifically, imagine a server with U=1.5 and 3 tasks, a high prio task
> that requires .8 a medium prio task that requires .6 and a low prio task
> that soaks up whatever it can get its little grubby paws on.
> 
> Then with minimal concurreny this works out nicely, high gets .8, mid
> gets .6 and low gets the remaining .1.
> 
> If OTOH you don't limit concurrency and let them all run concurrently,
> you can end up with the situation where they each get .5. Which is
> obviously fail.
> 
> Add affinities here though and you're up a creek, how do you distribute
> utilization between the slices, what slices, etc.. You say given them a
> per-cpu cgroup interface, and have them configure it themselves, but
> that's a god-aweful thing to ask userspace to do.

and yep again... It is definitely a harder topic... but it gets simpler as we do
those other moves...

> Ideally, I'd delete all of FIFO, it's such a horrid trainwreck, a total
> and abysmal failure of a model -- thank you POSIX :-(

-- Daniel