linux-kernel - Re: [PATCH 0/4] sched/fair: Manage lag and run to parity with different slices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAvuuOTmuMpzs8GUpUebL76h7F8zuN1tnJz_KFYxAFN3w@mail.gmail.com>
Date: Fri, 20 Jun 2025 12:29:27 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, dietmar.eggemann@....com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] sched/fair: Manage lag and run to parity with
 different slices

On Fri, 20 Jun 2025 at 10:42, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Jun 19, 2025 at 02:27:43PM +0200, Vincent Guittot wrote:
> > On Wed, 18 Jun 2025 at 09:03, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> > >
> > > On Tue, 17 Jun 2025 at 11:22, Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > On Fri, Jun 13, 2025 at 04:05:10PM +0200, Vincent Guittot wrote:
> > > > > Vincent Guittot (3):
> > > > >   sched/fair: Use protect_slice() instead of direct comparison
> > > > >   sched/fair: Limit run to parity to the min slice of enqueued entities
> > > > >   sched/fair: Improve NO_RUN_TO_PARITY
> > > >
> > > > Ah. I wrote these here patches and then totally forgot about them :/.
> > > > They take a different approach.
> > > >
> > > > The approach I took was to move decision to stick with curr after pick,
> > > > instead of before it. That way we can evaluate the tree at the time of
> > > > preemption.
> > >
> > > Let me have a look at your patches
> >
> > I have looked and tested your patches but they don't solve the lag and
> > run to parity issues not sur what he's going wrong.
>
> Humm.. So what you do in patch 3, setting the protection to min_slice
> instead of the deadline, that only takes into account the tasks present
> at the point we schedule.

yes but at this point any waking up task is either the next running
task or enqueued in the rb tree

>
> Which is why I approached it by moving the protection to after pick;
> because then we can directly compare the task we're running to the
> best pick -- which includes the tasks that got woken. This gives
> check_preempt_wakeup_fair() better chances.

we don't always want to break the run to parity but only when a task
wakes up and should preempt current or decrease the run to parity
period. Otherwise, the protection applies for a duration that is short
enough to stay fair for others

I will see if check_preempt_wakeup_fair can be smarter when deciding
to cancel the protection

>
> To be fair, I did not get around to testing the patches much beyond
> booting them, so quite possibly they're buggered :-/
>
> > Also, my patchset take into account the NO_RUN_TO_PARITY case by
> > adding a notion of quantum execution time which was missing until now
>
> Right; not ideal, but I suppose for the people that disable
> RUN_TO_PARITY it might make sense. But perhaps there should be a little
> more justification for why we bother tweaking a non-default option.

Otherwise disabling RUN_TO_PARITY to check if it's the root cause of a
regression or a problem becomes pointless because the behavior without
the feature is wrong.
And some might not want to run to parity but behave closer to the
white paper with a pick after each quantum with quantum being
something in the range [0.7ms:2*tick)

>
> The problem with usage of normalized_sysctl_ values is that you then get
> behavioural differences between 1 and 8 CPUs or so. Also, perhaps its

normalized_sysctl_ values don't scale with the number of CPUs. In this
case, it's always 0.7ms which is short enough compare to 1ms tick
period to prevent default irq accounting to keep current for another
tick

> time to just nuke that whole scaling thing (I'm sure someone mentioned
> that a short while ago).
>
> > Regarding the "fix delayed requeue", I already get an update of
> > current before requeueing a delayed task. Do you have a use case in
> > mind ?
>
> Ah, it was just from reading code, clearly I missed something. Happy to
> forget about that patch :-)