linux-kernel - BUG Report: Fork benchmark drop by 30% on aarch64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250205151026.13061-1-hagarhem@amazon.com>
Date: Wed, 5 Feb 2025 15:10:24 +0000
From: Hagar Hemdan <hagarhem@...zon.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
	<vincent.guittot@...aro.org>
CC: Hagar Hemdan <hagarhem@...zon.com>, wuchi <wuchi.zero@...il.com>,
	<linux-kernel@...r.kernel.org>, <Mohamed@...zon.com>,
	<Abuelfotoh@...zon.com>, Hazem <abuehaze@...zon.com>
Subject: BUG Report: Fork benchmark drop by 30% on aarch64

Hi,

There is about a 30% drop in fork benchmark [1] on aarch64 and a 10%
drop on x86_64 using kernel v6.13.1.

Git bisect pointed to commit eff6c8ce8d4d ("sched/core: Reduce cost
of sched_move_task when config autogroup") which merged starting
v6.4-rc1.

The regression only happens when number of CPUs is equal to number
of threads [2] that fork test is creating which means it's only visible
under CPU contention.

I used m6g.xlarge AWS EC2 Instance with 4 vCPUs and 16 GiB RAM for ARM64
and m6a.xlarge with also 4 vCPUs and 16 GiB RAM for x86_64.

I noticed this regression exists only when autogroup config is enabled.

Run the fork test with these combinations and autogroup is enabled:

Arch      | commit eff6c8ce8d4d | Fork Result (lps)  |  %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64   | without             | 28677.0            |  3.2 us, 96.7 sy
aarch64   | with                | 19860.7 (30% drop) |  2.7 us, 79.4 sy
x86_64    | without             | 27776.2            |  3.1 us, 96.9 sy
x86_64    | with                | 25020.6 (10% drop) |  4.1 us, 93.2 sy
----------+---------------------+--------------------+------------------

It seems that the commit is capping the amount of CPU resources that can
be utilized leaving around 18% idle in case of aarch64 and 3% idle in
x86_64 case which is likely the main reason behind the reported fork
regression.

When autogroup is disabled:

Arch      | commit eff6c8ce8d4d | Fork Result (lps)  |  %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64   | without             | 19877.8            |  2.2 us, 80.1 sy  
aarch64   | with                | 20086.3 (~same)    |  1.9 us, 80.2 sy
x86_64    | without             | 24974.2            |  4.9 us, 92.5 sy 
x86_64    | with                | 24921.5 (~same)    |  4.9 us, 92.4 sy
----------+---------------------+--------------------+------------------

So when autogroup disabled, I still see the amount of idle CPU resources 
18%, 3% on aarch64 and x86_64 regardless of commit.

Is this performance drop an expected of this commit when autogroup is
enabled?

Thanks,
Hagar

[1] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench
[2] Used command: ./Run -c 4 spawn