linux-kernel - Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240408090639.GD21904@noisy.programming.kicks-ass.net>
Date: Mon, 8 Apr 2024 11:06:39 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Chen Yu <yu.c.chen@...el.com>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	linux-kernel@...r.kernel.org, kprateek.nayak@....com,
	wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
	yu.chen.surf@...il.com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

On Sat, Apr 06, 2024 at 05:23:25PM +0800, Chen Yu wrote:

> The 99th wakeup latency increases a little bit, and should be in the acceptible
> range(25 -> 31 us).

Ah, my test runs haven't been stable enough to observe that.

> Meanwhile the throughput increases accordingly. Here are
> the possible reason I can think of:
> 
> 1. wakeup latency: The time to find an eligible entity in the tree
>    during wakeup might take longer - if there are more delayed-dequeue
>    tasks in the tree.

Another possible cause might be that previously a schedule() would be
1 dequeue, 1 pick.

But now it can be much more variable, a pick can basically do N dequeues
and N+1 picks.

So not only do we do more picks, but if you're focussed on worst case
latency, it goes up, because we can do multiple dequeues for a single
pick.

If we find this to really be a problem, I had some half baked ideas to
fix it, but it added significant complexity, so keep it simple until
need proves we need more etc.

> 2. throughput: Inhibit task dequeue can decrease the ratio to touch the
>    task group's load_avg: dequeue_entity()-> { update_load_avg(), update_cfs_group()),
>    which reduces the cache contention among CPUs, and improves throughput.

Ah, yes, there's that.

> > +	} else {
> > +		bool sleep = flags & DEQUEUE_SLEEP;
> > +
> > +		SCHED_WARN_ON(sleep && se->sched_delayed);
> > +		update_curr(cfs_rq);
> > +
> > +		if (sched_feat(DELAY_DEQUEUE) && sleep &&
> > +		    !entity_eligible(cfs_rq, se)) {
> 
> Regarding the elibigle check, it was found that there could be an overflow
> issue, and it brings false negative of entity_eligible(), which was described here:
> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> and also reported on another machine
> https://lore.kernel.org/lkml/ZeCo7STWxq+oyN2U@gmail.com/
> I don't have good idea to avoid that overflow properly, while I'm trying to
> reproduce it locally, do you have any guidance on how to address it?

I have not yet seen those, let me go stare at them now. Thanks!