lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com>
Date: Mon, 28 Apr 2025 07:43:05 +0000
From: Jean-Baptiste Roquefere <jb.roquefere@...me.com>
To: K Prateek Nayak <kprateek.nayak@....com>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>, "Gautham R. Shenoy" <gautham.shenoy@....com>,
	Swapnil Sapkal <swapnil.sapkal@....com>
CC: "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
	"mingo@...nel.org" <mingo@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Borislav Petkov <bp@...en8.de>
Subject: Re: IPC drop down on AMD epyc 7702P

Hello Prateek,

thank's for your reponse.


> Looking at the commit logs, it looks like these commits do solve other
> problems around load balancing and might not be trivial to revert
> without evaluating the damages.

it's definitely not a productizable workaround !

> The processor you are running on, the AME EPYC 7702P based on the Zen2
> architecture contains 4 cores / 8 threads per CCX (LLC domain) which is
> perhaps why reducing the thread count to below this limit is helping
> your workload.
>
> What we suspect is that when running the workload, the threads that
> regularly sleep trigger a newidle balancing which causes them to move
> to another CCX leading to higher number of L3 misses.
>
> To confirm this, would it be possible to run the workload with the
> not-yet-upstream perf sched stats [1] tool and share the result from
> perf sched stats diff for the data from v6.12.17 and v6.12.17 + patch
> to rule out any other second order effect.
>
> [1] 
> https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/

I had to patch tools/perf/util/session.c : static int 
open_file_read(struct perf_data *data) due to "failed to open perf.data: 
File exists" (looked more like a compiler issue than a tool/perf issue)

$ ./perf sched stats diff perf.data.6.12.17 perf.data.6.12.17patched > 
perf.diff (see perf.diff attached)

> Assuming you control these deployments, would it possible to run
> the workload on a kernel running with "relax_domain_level=2" kernel
> cmdline that restricts newidle balance to only within the CCX. As a
> side effect, it also limits  task wakeups to the same LLC domain but
> I would still like to know if this makes a difference to the
> workload you are running.
On vanilla 6.12.17 it gives the IPC we expected:

+--------------------+--------------------------+-----------------------+
|                    | relax_domain_level unset | relax_domain_level=2  |
+--------------------+--------------------------+-----------------------+
| Threads            |  210                     | 210                  |
| Utilization (%)    |  65,86                   | 52,01                |
| CPU effective freq |  1 622,93                |  1 294,12             |
| IPC                |  1,14                    | 1,42                 |
| L2 access (pti)    |  34,36                   | 38,18                |
| L2 miss   (pti)    |  7,34                    | 7,78                 |
| L3 miss   (abs)    |  39 711 971 741          |  33 929 609 924       |
| Mem (GB/s)         |  70,68                   | 49,10                |
| Context switches   |  109 281 524             |  107 896 729          |
+--------------------+--------------------------+-----------------------+

Kind regards,

JB

View attachment "perf.diff" of type "text/x-patch" (20149 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ