linux-kernel - Re: [PATCH v2 00/12] sched: Address schbench regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d5cb15bd-1096-45a8-9da6-a37ff490714c@linux.ibm.com>
Date: Wed, 9 Jul 2025 22:16:14 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        vschneid@...hat.com, clm@...a.com,
        Madhavan Srinivasan <maddy@...ux.ibm.com>
Subject: Re: [PATCH v2 00/12] sched: Address schbench regression



On 7/9/25 00:32, Peter Zijlstra wrote:
> On Mon, Jul 07, 2025 at 11:49:17PM +0530, Shrikanth Hegde wrote:
> 
>> Git bisect points to
>> # first bad commit: [dc968ba0544889883d0912360dd72d90f674c140] sched: Add ttwu_queue support for delayed tasks
> 
> Moo.. Are IPIs particularly expensive on your platform?
> 
> The 5 cores makes me think this is a partition of sorts, but IIRC the
> power LPAR stuff was fixed physical, so routing interrupts shouldn't be
> much more expensive vs native hardware.
> 

Yes, we call it as dedicated LPAR. (Hypervisor optimises such that overhead is minimal,
i think that i true for interrupts too).


Some more variations of testing and numbers:

The system had some configs which i had messed up such as CONFIG_SCHED_SMT=n. I copied the default
distro config back and ran the benchmark again. Slightly better numbers compared to earlier.
Still a major regression. Collected mpstat numbers. It shows much less percentage compared to
earlier.

--------------------------------------------------------------------------
base: 8784fb5fa2e0 (tip/master)

Wakeup Latencies percentiles (usec) runtime 30 (s) (41567569 total samples)
           50.0th: 11         (10767158 samples)
           90.0th: 22         (16782627 samples)
         * 99.0th: 36         (3347363 samples)
           99.9th: 52         (344977 samples)
           min=1, max=731
RPS percentiles (requests) runtime 30 (s) (31 total samples)
           20.0th: 1443840    (31 samples)
         * 50.0th: 1443840    (0 samples)
           90.0th: 1443840    (0 samples)
           min=1433480, max=1444037
average rps: 1442889.23

CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all    3.24    0.00   11.39    0.00   37.30    0.00    0.00    0.00    0.00   48.07
all    2.59    0.00   11.56    0.00   37.62    0.00    0.00    0.00    0.00   48.23



base + clm's patch + series:
Wakeup Latencies percentiles (usec) runtime 30 (s) (27166787 total samples)
           50.0th: 57         (8242048 samples)
           90.0th: 120        (10677365 samples)
         * 99.0th: 182        (2435082 samples)
           99.9th: 262        (241664 samples)
           min=1, max=89984
RPS percentiles (requests) runtime 30 (s) (31 total samples)
           20.0th: 896000     (8 samples)
         * 50.0th: 902144     (10 samples)
           90.0th: 928768     (10 samples)
           min=881548, max=971101
average rps: 907530.10                                               <<< close to 40% drop in RPS.

CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
all    1.95    0.00    7.67    0.00   14.84    0.00    0.00    0.00    0.00   75.55
all    1.61    0.00    7.91    0.00   13.53    0.05    0.00    0.00    0.00   76.90

-----------------------------------------------------------------------------

- To be sure, I tried on another system. That system had 30 cores.

base:
Wakeup Latencies percentiles (usec) runtime 30 (s) (40339785 total samples)
           50.0th: 12         (12585268 samples)
           90.0th: 24         (15194626 samples)
         * 99.0th: 44         (3206872 samples)
           99.9th: 59         (320508 samples)
           min=1, max=1049
RPS percentiles (requests) runtime 30 (s) (31 total samples)
           20.0th: 1320960    (14 samples)
         * 50.0th: 1333248    (2 samples)
           90.0th: 1386496    (12 samples)
           min=1309615, max=1414281

base + clm's patch + series:
Wakeup Latencies percentiles (usec) runtime 30 (s) (34318584 total samples)
           50.0th: 23         (10486283 samples)
           90.0th: 64         (13436248 samples)
         * 99.0th: 122        (3039318 samples)
           99.9th: 166        (306231 samples)
           min=1, max=7255
RPS percentiles (requests) runtime 30 (s) (31 total samples)
           20.0th: 1006592    (8 samples)
         * 50.0th: 1239040    (9 samples)
           90.0th: 1259520    (11 samples)
           min=852462, max=1268841
average rps: 1144229.23                                             << close 10-15% drop in RPS


- Then I resized that 30 core LPAR into a 5 core LPAR to see if the issue pops up in a smaller
config. It did. I see similar regression of 40-50% drop in RPS.

- Then I made it as 6 core system. To see if this is due to any ping pong because of odd numbers.
Numbers are similar to 5 core case.

- Maybe regressions is higher in smaller configurations.