linux-kernel - Re: [PATCH v2 02/12] sched/deadline: Less agressive dl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3c67ae44-5244-4341-9edd-04a93b1cb290@meta.com>
Date: Tue, 15 Jul 2025 10:55:03 -0400
From: Chris Mason <clm@...a.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 02/12] sched/deadline: Less agressive dl_server
 handling

On 7/14/25 6:56 PM, Mel Gorman wrote:
> On Wed, Jul 02, 2025 at 01:49:26PM +0200, Peter Zijlstra wrote:
>> Chris reported that commit 5f6bd380c7bd ("sched/rt: Remove default
>> bandwidth control") caused a significant dip in his favourite
>> benchmark of the day. Simply disabling dl_server cured things.
>>
> 
> Unrelated to the patch but I've been doing a bit of arcology recently
> finding the motivation for various decisions and paragraphs like this
> have been painful (most recent was figuring out why a decision was made
> for 2.6.32). If the load was described, can you add a Link: tag?  If the
> workload is proprietary, cannot be described or would be impractical to
> independently created than can that be stated here instead?
> 

Hi Mel,

"benchmark of the day" is pretty accurate, since I usually just bash on
schbench until I see roughly the same problem that I'm debugging from
production.  This time, it was actually a networking benchmark (uperf),
but setup for that is more involved.

This other thread describes the load, with links to schbench and command
line:

https://lore.kernel.org/lkml/20250626144017.1510594-2-clm@fb.com/

The short version:

https://github.com/masoncl/schbench.git
schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0

- 4 CPUs waking up all the other CPUs constantly
  - (pretending to be network irqs)
- 1024 total worker threads spread over the other CPUs
- all the workers immediately going idle after waking
- single socket machine with ~250 cores and HT.

The basic recipe for the regression is as many CPUs as possible going in
and out of idle.

(I know you're really asking for these details in the commit or in the
comments, but hopefully this is useful for Link:'ing)

-chris