linux-kernel - Re: [PATCH V2] sched/fair: Dequeue sched_delayed tasks when waking to a busy CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ce23f09226a7240d4f15b378959c9db9bc5e466f.camel@gmx.de>
Date: Wed, 27 Nov 2024 15:13:22 +0100
From: Mike Galbraith <efault@....de>
To: K Prateek Nayak <kprateek.nayak@....com>, Phil Auld <pauld@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, 
 juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com,  rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com,  linux-kernel@...r.kernel.org,
 wuyun.abel@...edance.com,  youssefesmat@...omium.org, tglx@...utronix.de
Subject: Re: [PATCH V2] sched/fair: Dequeue sched_delayed tasks when waking
 to a busy CPU

On Tue, 2024-11-26 at 07:30 +0100, Mike Galbraith wrote:
>
> The intent is to blunt the instrument a bit. Paul should have a highly
> active interrupt source, which will give wakeup credit to whatever is
> sitting on that CPU, breaking 1:1 connections.. a little bit.   That's
> why it still migrates tbench buddies, but NOT at a rate that turns a
> tbench progression into a new low regression.

BTW, the reason for the tbench wreckage being so bad is that when the
box is near saturation, not only are a portion of the surviving
sched_delayed tasks affine wakeups (always the optimal configuration
for this fast mover cliebt/server pair in an L3 equipped box), they are
exclusively affine wakeups.  That is most definitely gonna hurt.

When saturating that becomes the best option for a lot of client/server
pairs, even those with a lot of concurrency.  Turning them loose to
migrate at that time is far more likely than not to hurt a LOT, so V1
was doomed.

> The hope is that the
> load shift caused by that active interrupt source is enough to give
> Paul's regression some of the help it demonstrated wanting, without the
> collateral damage.  It might now be so weak as to not meet the
> "meaningful" in my question, in which case it lands on the ginormous
> pile of meh, sorta works, but why would anyone care.

In my IO challenged box, patch is useless to fio, nothing can help a
load where all of the IO action, and wimpy action at that, is nailed to
one CPU.  I can see it helping other latency sensitive stuff, like say
1:N mother of all work and/or control threads (and ilk), but if Phil's
problematic box looks anything like this box.. nah, it's a long reach.

	-Mike