[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241104130515.GB749675@pauld.westford.csb>
Date: Mon, 4 Nov 2024 08:05:15 -0500
From: Phil Auld <pauld@...hat.com>
To: Mike Galbraith <efault@....de>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org,
kprateek.nayak@....com, wuyun.abel@...edance.com,
youssefesmat@...omium.org, tglx@...utronix.de
Subject: Re: [PATCH 17/24] sched/fair: Implement delayed dequeue
On Sat, Nov 02, 2024 at 05:32:14AM +0100 Mike Galbraith wrote:
> On Fri, 2024-11-01 at 16:07 -0400, Phil Auld wrote:
>
>
> > Thanks for jumping in. My jargon decoder ring seems to be failing me
> > so I'm not completely sure what you are saying below :)
> >
> > "buddies" you mean tasks that waking each other up and sleeping.
> > And one runs for longer than the other, right?
>
> Yeah, buddies are related waker/wakee 1:1 1:N or M:N, excluding tasks
> happening to be sitting on a CPU where, say a timer fires, an IRQ leads
> to a wakeup of lord knows what, lock wakeups etc etc etc. I think Peter
> coined the term buddy to mean that (less typing), and it stuck.
>
Thanks!
> > > 1 tbench buddy pair scheduled cross core.
> > >
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
> > > 13770 root 20 0 21424 1920 1792 S 60.13 0.012 0:33.81 3 tbench
> > > 13771 root 20 0 4720 896 768 S 46.84 0.006 0:26.10 2 tbench_srv
> >
> > > Note 60/47 utilization, now pinned/stacked.
> > >
> > > 6.1.114-cfs
> > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
> > > 4407 root 20 0 21424 1980 1772 R 50.00 0.012 0:29.20 3 tbench
> > > 4408 root 20 0 4720 124 0 R 50.00 0.001 0:28.76 3 tbench_srv
> >
> > What is the difference between these first two? The first is on
> > separate cores so they don't interfere with each other? And the second is
> > pinned to the same core?
>
> Yeah, see 'P'. Given CPU headroom, a tbench pair can consume ~107%.
> They're not fully synchronous.. wouldn't be relevant here/now if they
> were :)
>
> > > Note what happens to the lighter tbench_srv. Consuming red hot L2 data,
> > > it can utilize a full 50%, but it must first preempt wide bottom buddy.
> > >
> >
> > We've got "light" and "wide" here which is a bit mixed metaphorically
> > :)
>
> Wide, skinny, feather-weight or lard-ball, they all work for me.
>
> > So here CFS is letting the wakee preempt the waker and providing pretty
> > equal fairness. And hot l2 caching is masking the assymmetry.
>
> No, it's way simpler: preemption slices through the only thing it can
> slice through, the post wakeup concurrent bits.. that otherwise sits
> directly in the communication stream as a lump of latency in a latency
> bound operation.
>
> >
> > With wakeup preemption off it doesn't help in my case. I was thinking
> > maybe the preemption was preventing some batching of IO completions
> > or
> > initiations. But that was wrong it seems.
>
> Dunno.
>
> > Does it also possibly make wakeup migration less likely and thus increase
> > stacking?
>
> The buddy being preempted certainly won't be wakeup migrated, because
> it won't sleep. Two very sleepy tasks when bw constrained becomes one
> 100% hog and one 99.99% hog when CPU constrained.
>
Not the waker who gets preempted but the wakee may be a bit more
sticky on his current cpu and thus stack more since he's still
in that runqueue. But that's just a mental excercise trying to
find things that are directly related to delay dequeue. No observation
other than the over all perf hit.
> > > Bottom line, box full of 1:1 buddies pairing up and stacking in L2.
> > >
> > > tbench 8
> > > 6.1.114-cfs 3674.37 MB/sec
> > > 6.1.114-eevdf 3505.25 MB/sec -delay_dequeue
> > > 3701.66 MB/sec +delay_dequeue
> > >
> > > For tbench, preemption = shorter turnaround = higher throughput.
> >
> > So here you have a benchmark that gets a ~5% boost from
> > delayed_dequeue.
> >
> > But I've got one that get's a 20% penalty so I'm not exactly sure what
> > to make of that. Clearly FIO does not have the same pattern as tbench.
>
> There are basically two options in sched-land, shave fastpath cycles,
> or some variant of Rob Peter to pay Paul ;-)
>
That Peter is cranky :)
Cheers,
Phil
> -Mike
>
--
Powered by blists - more mailing lists