[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd737a9a498638b253d6e273cbbea108b6c5a4b0.camel@gmx.de>
Date: Thu, 07 Nov 2024 05:03:49 +0100
From: Mike Galbraith <efault@....de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Phil Auld <pauld@...hat.com>, mingo@...hat.com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
linux-kernel@...r.kernel.org, kprateek.nayak@....com,
wuyun.abel@...edance.com, youssefesmat@...omium.org, tglx@...utronix.de
Subject: Re: [PATCH 17/24] sched/fair: Implement delayed dequeue
On Wed, 2024-11-06 at 16:22 +0100, Mike Galbraith wrote:
>
> Hm, if we try to bounce a preempted task and fail, the wakeup_preempt()
> call won't happen.
Zzzt, wrong, falling through still leads to the bottom of a wakeup with
its preempt check...
> Bouncing preempted tasks is double edged sword..
..but that bit is pretty intriguing. From the service latency and
utilization perspective only at decision time (prime mission), it's an
obvious win to migrate to an idle CPU.
It's also a clear win for communication latency when buddies are NOT
popular but misused end to end latency measurement tools ala TCP_RR
with only microscopic concurrency. For the other netperf modes of
operation, there's no shortage of concurrency to salvage *and get out
of the communication stream*, and I think that applies to wide swaths
of the real world. What makes it intriguing is the cross-over point
where "stacking is the stupidest idea ever" becomes "stacking may put
my and my buddy's wide butts directly in our own communication stream,
but that's less pain than what unrelated wide butts inflict on top of
higher LLC vs L2 latency".
For UDP_STREAM (async to the bone), there is no such a point, it would
seemingly prefer its buddy call from orbit, but for its more reasonable
TCP brother and ilk, there is.
Sample numbers (talk), interference is 8 unbound 88% compute instances,
box is crusty ole 8 rq i7-4790.
UDP_STREAM-1 unbound Avg: 47135 Sum: 47135
UDP_STREAM-1 stacked Avg: 39602 Sum: 39602
UDP_STREAM-1 cross-smt Avg: 61599 Sum: 61599
UDP_STREAM-1 cross-core Avg: 67680 Sum: 67680
(distancia muy bueno!)
TCP_STREAM-1 unbound Avg: 26299 Sum: 26299
TCP_STREAM-1 stacked Avg: 27893 Sum: 27893
TCP_STREAM-1 cross-smt Avg: 16728 Sum: 16728
TCP_STREAM-1 cross-core Avg: 13877 Sum: 13877
(idiota, distancia NO bueno, castillo inflable muy bueno!)
Service latency dominates.. not quite always, and bouncing tasks about
is simultaneously the only sane thing to do and pure evil... like
everything else in sched land, making it a hard game to win :)
I built that patch out of curiosity, and yeah, set_next_task_fair()
finding a cfs_rq->curr ends play time pretty quickly. Too bad my
service latency is a bit dinged up, bouncing preempted wakees about
promises to be interesting.
-Mike
Powered by blists - more mailing lists