lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANDhNCoMUrZppodAa0bAdds=M3S7u3VqAmiR_Qd-ow=kxDD9=g@mail.gmail.com>
Date: Wed, 18 Jun 2025 14:45:03 -0700
From: John Stultz <jstultz@...gle.com>
To: Kuyo Chang <kuyo.chang@...iatek.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, Matthias Brugger <matthias.bgg@...il.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, linux-kernel@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org, 
	"Team, Android" <kernel-team@...roid.com>
Subject: Re: [PATCH v2 1/1] sched/deadline: Fix dl_server runtime calculation formula

On Tue, Jun 17, 2025 at 8:54 AM Kuyo Chang <kuyo.chang@...iatek.com> wrote:
>
> From: kuyo chang <kuyo.chang@...iatek.com>
>
> [Symptom]
> The calculation formula for dl_server runtime is based on
> Frequency/capacity scale-invariance.
> This will cause excessive RT latency (expect absolute time).
>
> [Analysis]
> Consider the following case under a Big.LITTLE architecture:
>
> Assume the runtime is: 50,000,000 ns, and Frequency/capacity
> scale-invariance defined as below:
>
> Frequency scale-invariance: 100
> Capacity scale-invariance: 50
> First by Frequency scale-invariance,
> the runtime is scaled to 50,000,000 * 100 >> 10 = 4,882,812
> Then by capacity scale-invariance,
> it is further scaled to 4,882,812 * 50 >> 10 = 238,418.
>
> So it will scaled to 238,418 ns.
>
> [Solution]
> The runtime for dl_server should be fixed time
> asis RT bandwidth control.
> Fix the runtime calculation formula for the dl_server.
>
> Signed-off-by: kuyo chang <kuyo.chang@...iatek.com>
> Acked-by: Juri Lelli <juri.lelli@...hat.com>
> Suggested-by: Peter Zijlstra <peterz@...radead.org>
>
> v1: https://lore.kernel.org/all/20250614020524.631521-1-kuyo.chang@mediatek.com/
>

Coding nits aside, I put together a quick test that affines to a
single cpu a SCHED_NORMAL and SCHED_FIFO spinner task to illustrate
the issue.

You can quickly see the requested 50ms/sec dl_sever runtime on the big
cpu, ends up being scaled out to 323ms/sec, blocking RT tasks on that
little cpu for quite awhile.
  https://github.com/johnstultz-work/misc/blob/main/images/2025-06-18_illustration-of-problem-dl-server-scaling.png

The wild thing with the example illustration of the issue above is
that since my test uses cpu spinners, the cpufreq quickly maxes out.
So it's only really considering the capacity scaling between the big
(cpu 7) and little (cpu 0) cpus at their top frequency.

When I capped the cpu 0 max frequency to the lowest available, without
the patch the behavior is crazy:
  https://github.com/johnstultz-work/misc/blob/main/images/2025-06-18_dl-server-scaling-with-cpufreq-lowered.png

Though the image alone maybe isn't as clear, in that case we see the
RT task once it runs ~650ms, the dl_server kicks in and blocks it and
any other RT task from running for over *10 minutes*!

And with the fix to avoid scaling the fair_server, the results looks
much more sane:
https://github.com/johnstultz-work/misc/blob/main/images/2025-06-18_with-patch-to-not-scale-dl-server-fixed.png

So I'm very happy to add:
  Tested-by: John Stultz <jstultz@...gle.com>

And hope this gets upstream (and -stable) in some form quickly.

Thanks so much to Kuyo and others on his team for reporting and
root-causing this issue!

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ