linux-kernel - Re: wakeup_affine_weight() is b0rked - was Re: [PATCH 2/2] sched/fair: Scale wakeup granularity relative to nr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0374eabd5eb0f7fe6527beb187ccb1e88965ab2b.camel@gmx.de>
Date:   Wed, 06 Oct 2021 08:46:07 +0200
From:   Mike Galbraith <efault@....de>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Barry Song <21cnbao@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: wakeup_affine_weight() is b0rked - was Re: [PATCH 2/2]
 sched/fair: Scale wakeup granularity relative to nr_running

On Tue, 2021-10-05 at 10:31 +0100, Mel Gorman wrote:
>
> If you write a formal patch with a Signed-off-by, I'll use it as a baseline
> for the wakegran work or submit it directly. That way, I'll get any LKP
> reports, user regression reports on lkml, any regression reports via
> openSUSE etc and deal with them.
>
> Based on the results I have so far, I would not be twiddling it further
> but that might change when a full range of machines have completed all
> of their tests. Ideally, I would do some tracing to confirm that maximum
> runqueue depth is really reduced by the path.

Ok, I amended it, adding probably way too many words for a dinky bend-
adjust.  Feel free to do whatever you like with every bit below.

sched: Make wake_wide() handle wakees with no wakee_flips

While looking into a wakeup time task stacking problem, noticed
that wake_wide() wasn't recognizing X as a waker-of-many despite
it regularly burst waking 24 QXcbEventQueue threads.  The reason
for this lies in the heuristic requiring both the multi-waker and
its minions to be earning wakee_flips, ie both wake more than one
task. X earns plenty, but the event threads rarely earn any,
allowing the lot to meet wake_affine_weight(), where its use of
slow to update load averages MAY direct the entire burst toward
the waker's CPU, where they WILL stack if SIS can't deflect them.

To combat this, have the multi-waker (X in this case) trickle
enough flips to its wakees to keep those who don't earn flips
barely eligible.  To reduce aging taking too many of the thread
pool below the limit due to periods of inactivity, continue to
wake wide IFF the waker has a markedly elevated flip frequency.

This improved things for the X+QXcbEventQueue burst, but is a
bit hacky. We need a better M:N load activity differentiator.

Note: given wake_wide()'s mission is to recognize when thread
pool activity exceeds sd_llc_size, it logically SHOULD play no
role whatsoever in boxen with a single LLC, there being no other
LLCs to expand/shrink load distribution to/from.  While this
patchlet hopefully improves M:N detection a bit in general, it
fixes nothing.

Signed-off-by: Mike Galbraith <efault@....de>
---
 kernel/sched/fair.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5869,6 +5869,15 @@ static void record_wakee(struct task_str
 	}

 	if (current->last_wakee != p) {
+		int min = __this_cpu_read(sd_llc_size) << 1;
+		/*
+		 * Couple waker flips to the wakee for the case where it
+		 * doesn't accrue any of its own, taking care to not push
+		 * it high enough to break the wake_wide() waker:wakees
+		 * heuristic for those that do accrue their own flips.
+		 */
+		if (current->wakee_flips > p->wakee_flips * min)
+			p->wakee_flips++;
 		current->last_wakee = p;
 		current->wakee_flips++;
 	}
@@ -5899,7 +5908,7 @@ static int wake_wide(struct task_struct

 	if (master < slave)
 		swap(master, slave);
-	if (slave < factor || master < slave * factor)
+	if ((slave < factor && master < (factor>>1)*factor) || master < slave * factor)
 		return 0;
 	return 1;
 }