[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231013143414.GA36211@noisy.programming.kicks-ass.net>
Date: Fri, 13 Oct 2023 16:34:14 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Benjamin Segall <bsegall@...gle.com>
Cc: mingo@...nel.org, vincent.guittot@...aro.org,
linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, mgorman@...e.de,
bristot@...hat.com, corbet@....net, qyousef@...alina.io,
chris.hyser@...cle.com, patrick.bellasi@...bug.net, pjt@...gle.com,
pavel@....cz, qperret@...gle.com, tim.c.chen@...ux.intel.com,
joshdon@...gle.com, timj@....org, kprateek.nayak@....com,
yu.c.chen@...el.com, youssefesmat@...omium.org,
joel@...lfernandes.org, efault@....de, tglx@...utronix.de
Subject: Re: [PATCH 03/15] sched/fair: Add lag based placement
On Thu, Oct 12, 2023 at 12:15:12PM -0700, Benjamin Segall wrote:
> Peter Zijlstra <peterz@...radead.org> writes:
>
> > @@ -4853,49 +4872,119 @@ static void
> > place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > {
> > u64 vruntime = avg_vruntime(cfs_rq);
> > + s64 lag = 0;
> >
> > - /* sleeps up to a single latency don't count. */
> > - if (!initial) {
> > - unsigned long thresh;
> > + /*
> > + * Due to how V is constructed as the weighted average of entities,
> > + * adding tasks with positive lag, or removing tasks with negative lag
> > + * will move 'time' backwards, this can screw around with the lag of
> > + * other tasks.
> > + *
> > + * EEVDF: placement strategy #1 / #2
> > + */
>
> So the big problem with EEVDF #1 compared to #2/#3 and CFS (hacky though
> it is) is that it creates a significant perverse incentive to yield or
> spin until you see yourself be preempted, rather than just sleep (if you
> have any competition on the cpu). If you go to sleep immediately after
> doing work and happen to do so near the end of a slice (arguably what
> you _want_ to have happen overall), then you have to pay that negative
> lag in wakeup latency later, because it is maintained through any amount
> of sleep. (#1 or similar is good for reweight/migrate of course)
>
> #2 in theory could be abused by micro-sleeping right before you are
> preempted, but that isn't something tasks can really predict, unlike
> seeing more "don't go to sleep, just spin, the latency numbers are so
> much better" nonsense.
For giggles (cyclictest vs hackbench):
$ echo PLACE_LAG > /debug/sched/features
$ ./doit-latency-slice.sh
# Running 'sched/messaging' benchmark:
slice 30000000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00051
# Avg Latencies: 00819
# Max Latencies: 172558
slice 3000000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00033
# Avg Latencies: 00407
# Max Latencies: 12024
slice 300000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00055
# Avg Latencies: 00395
# Max Latencies: 11780
$ echo NO_PLACE_LAG > /debug/sched/features
$ ./doit-latency-slice.sh
# Running 'sched/messaging' benchmark:
slice 30000000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00069
# Avg Latencies: 69071
# Max Latencies: 1492250
slice 3000000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00062
# Avg Latencies: 10215
# Max Latencies: 21209
slice 300000
# /dev/cpu_dma_latency set to 0us
# Min Latencies: 00055
# Avg Latencies: 00060
# Max Latencies: 03088
IOW, insanely worse latencies in most cases. This is because when
everybody starts at 0-lag, everybody is always eligible, and 'fairness'
goes out the window fast.
Placement strategy #1 only really works when you have well behaving
tasks (eg. conforming to the periodic task model -- not waking up before
its time and all that).
Powered by blists - more mailing lists