[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240408090639.GD21904@noisy.programming.kicks-ass.net>
Date: Mon, 8 Apr 2024 11:06:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org, kprateek.nayak@....com,
wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
yu.chen.surf@...il.com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue
On Sat, Apr 06, 2024 at 05:23:25PM +0800, Chen Yu wrote:
> The 99th wakeup latency increases a little bit, and should be in the acceptible
> range(25 -> 31 us).
Ah, my test runs haven't been stable enough to observe that.
> Meanwhile the throughput increases accordingly. Here are
> the possible reason I can think of:
>
> 1. wakeup latency: The time to find an eligible entity in the tree
> during wakeup might take longer - if there are more delayed-dequeue
> tasks in the tree.
Another possible cause might be that previously a schedule() would be
1 dequeue, 1 pick.
But now it can be much more variable, a pick can basically do N dequeues
and N+1 picks.
So not only do we do more picks, but if you're focussed on worst case
latency, it goes up, because we can do multiple dequeues for a single
pick.
If we find this to really be a problem, I had some half baked ideas to
fix it, but it added significant complexity, so keep it simple until
need proves we need more etc.
> 2. throughput: Inhibit task dequeue can decrease the ratio to touch the
> task group's load_avg: dequeue_entity()-> { update_load_avg(), update_cfs_group()),
> which reduces the cache contention among CPUs, and improves throughput.
Ah, yes, there's that.
> > + } else {
> > + bool sleep = flags & DEQUEUE_SLEEP;
> > +
> > + SCHED_WARN_ON(sleep && se->sched_delayed);
> > + update_curr(cfs_rq);
> > +
> > + if (sched_feat(DELAY_DEQUEUE) && sleep &&
> > + !entity_eligible(cfs_rq, se)) {
>
> Regarding the elibigle check, it was found that there could be an overflow
> issue, and it brings false negative of entity_eligible(), which was described here:
> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> and also reported on another machine
> https://lore.kernel.org/lkml/ZeCo7STWxq+oyN2U@gmail.com/
> I don't have good idea to avoid that overflow properly, while I'm trying to
> reproduce it locally, do you have any guidance on how to address it?
I have not yet seen those, let me go stare at them now. Thanks!
Powered by blists - more mailing lists