linux-kernel - Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7w7g1zb0nfu9-C7@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 24 Feb 2025 10:27:31 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Qais Yousef <qyousef@...alina.io>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>,
	Christian Loehle <christian.loehle@....com>,
	Jon Hunter <jonathanh@...dia.com>,
	Thierry Reding <treding@...dia.com>,
	Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Phil Auld <pauld@...hat.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

On 22/02/25 23:59, Qais Yousef wrote:
> On 02/17/25 15:52, Juri Lelli wrote:
> > On 16/02/25 16:33, Qais Yousef wrote:
> > > On 02/13/25 07:20, Juri Lelli wrote:
> > > > On 12/02/25 19:22, Dietmar Eggemann wrote:
> > > > > On 11/02/2025 11:42, Juri Lelli wrote:
> > > > 
> > > > ...
> > > > 
> > > > > > What about we actually ignore them consistently? We already do that for
> > > > > > admission control, so maybe we can do that when rebuilding domains as
> > > > > > well (until we find maybe a better way to deal with them).
> > > > > > 
> > > > > > Does the following make any difference?
> > > > > 
> > > > > It at least seems to solve the issue. And like you mentioned on irc, we
> > > > > don't know the bw req of sugov anyway.
> > > > > 
> > > > > So with this change we start with 'dl_bw->total_bw = 0' even w/ sugov tasks.
> > > > > 
> > > > > dl_rq[0]:
> > > > >   .dl_nr_running                 : 0
> > > > >   .dl_bw->bw                     : 996147
> > > > >   .dl_bw->total_bw               : 0       <-- !
> > > > > 
> > > > > IMHO, people who want to run serious DL can always check whether there
> > > > > are already these infrastructural DL tasks or even avoid schedutil.
> > > > 
> > > > It definitely not ideal and admittedly gross, but not worse than what we
> > > > are doing already considering we ignore sugovs at AC and the current
> > > > bandwidth allocation its there only to help with PI. So, duck tape. :/
> > > > 
> > > > A more proper way to work with this would entail coming up with sensible
> > > > bandwidth allocation for sugovs, but that's most probably hardware
> > > > specific, so I am not sure how we can make that general enough.
> > > 
> > > I haven't been following the problem closely, but one thing I was considering
> > > and I don't know if it makes sense to you and could help with this problem too.
> > > Shall we lump sugov with stopper class or create a new sched_class (seems
> > > unnecessary, I think stopper should do)? With the consolidate cpufreq update
> > > patch I've been working on Vincent raised issues with potential new ctx switch
> > > and to improve that I needed to look at improving sugov wakeup path. If we
> > > decouple it from DL I think that might fix your problem here and could allow us
> > > to special case it for other problems like the ones I faced more easily without
> > > missing up with DL.
> > > 
> > > Has the time come to consider retire the simple solution of making sugov a fake
> > > DL task?
> > 
> > Problem is that 'ideally' we would want to explicitly take sugovs into
> > account when designing the system. We don't do that currently as a
> > 'temporary solution' that seemed simpler than a proper approach (started
> > wondering if it's indeed simpler). So, not sure if moving sugovs outside
> > DL is something we want to do.
> 
> Okay I see. The issue though is that for a DL system with power management
> features on that warrant to wake up a sugov thread to update the frequency is
> sort of half broken by design. I don't see the benefit over using RT in this
> case. But I appreciate I could be misguided. So take it easy on me if it is
> obviously wrong understanding :) I know in Android usage of DL has been
> difficult, but many systems ship with slow switch hardware.
> 
> How does DL handle the long softirqs from block and network layers by the way?
> This has been in a practice a problem for RT tasks so they should be to DL.
> sugov done in stopper should be handled similarly IMHO. I *think* it would be
> simpler to masquerade sugov thread as irq pressure.

Kind of a trick question :), as DL doesn't handle this kind of
load/pressure explicitly. It is essentially agnostic about it. From a
system design point of view though, I would say that one should take
that into account and maybe convert sensible kthreads to DL, so that the
overall bandwidth can be explicitly evaluated. If one doesn't do that
probably a less sound approach is to treat anything not explicitly
scheduled by DL, but still required from a system perspective, as
overload and be more conservative when assigning bandwidth to DL tasks
(i.e. reduce the maximum amount of available bandwidth, so that the
system doesn't get saturated).

> You can use the rate_limit_us as a potential guide for how much bandwidth sugov
> needs if moving it to another class really doesn't make sense instead?

Or maybe try to estimate/measure how much utilization sugov threads are
effectively using while running some kind of workload of interest and
use that as an indication for DL runtime/period.