[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <996ca8cb-3ac8-4f1b-93f1-415f43922d7a@ateme.com>
Date: Mon, 28 Apr 2025 07:43:05 +0000
From: Jean-Baptiste Roquefere <jb.roquefere@...me.com>
To: K Prateek Nayak <kprateek.nayak@....com>, "stable@...r.kernel.org"
<stable@...r.kernel.org>, "Gautham R. Shenoy" <gautham.shenoy@....com>,
Swapnil Sapkal <swapnil.sapkal@....com>
CC: "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
"mingo@...nel.org" <mingo@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, Borislav Petkov <bp@...en8.de>
Subject: Re: IPC drop down on AMD epyc 7702P
Hello Prateek,
thank's for your reponse.
> Looking at the commit logs, it looks like these commits do solve other
> problems around load balancing and might not be trivial to revert
> without evaluating the damages.
it's definitely not a productizable workaround !
> The processor you are running on, the AME EPYC 7702P based on the Zen2
> architecture contains 4 cores / 8 threads per CCX (LLC domain) which is
> perhaps why reducing the thread count to below this limit is helping
> your workload.
>
> What we suspect is that when running the workload, the threads that
> regularly sleep trigger a newidle balancing which causes them to move
> to another CCX leading to higher number of L3 misses.
>
> To confirm this, would it be possible to run the workload with the
> not-yet-upstream perf sched stats [1] tool and share the result from
> perf sched stats diff for the data from v6.12.17 and v6.12.17 + patch
> to rule out any other second order effect.
>
> [1]
> https://lore.kernel.org/all/20250311120230.61774-1-swapnil.sapkal@amd.com/
I had to patch tools/perf/util/session.c : static int
open_file_read(struct perf_data *data) due to "failed to open perf.data:
File exists" (looked more like a compiler issue than a tool/perf issue)
$ ./perf sched stats diff perf.data.6.12.17 perf.data.6.12.17patched >
perf.diff (see perf.diff attached)
> Assuming you control these deployments, would it possible to run
> the workload on a kernel running with "relax_domain_level=2" kernel
> cmdline that restricts newidle balance to only within the CCX. As a
> side effect, it also limits task wakeups to the same LLC domain but
> I would still like to know if this makes a difference to the
> workload you are running.
On vanilla 6.12.17 it gives the IPC we expected:
+--------------------+--------------------------+-----------------------+
| | relax_domain_level unset | relax_domain_level=2 |
+--------------------+--------------------------+-----------------------+
| Threads | 210 | 210 |
| Utilization (%) | 65,86 | 52,01 |
| CPU effective freq | 1 622,93 | 1 294,12 |
| IPC | 1,14 | 1,42 |
| L2 access (pti) | 34,36 | 38,18 |
| L2 miss (pti) | 7,34 | 7,78 |
| L3 miss (abs) | 39 711 971 741 | 33 929 609 924 |
| Mem (GB/s) | 70,68 | 49,10 |
| Context switches | 109 281 524 | 107 896 729 |
+--------------------+--------------------------+-----------------------+
Kind regards,
JB
View attachment "perf.diff" of type "text/x-patch" (20149 bytes)
Powered by blists - more mailing lists