lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 18 Mar 2020 14:26:16 -0700 From: Stephane Eranian <eranian@...gle.com> To: Peter Zijlstra <peterz@...radead.org> Cc: Kim Phillips <kim.phillips@....com>, Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>, Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Arnaldo Carvalho de Melo <acme@...nel.org>, "H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>, Mark Rutland <mark.rutland@....com>, Michael Petlan <mpetlan@...hat.com>, Namhyung Kim <namhyung@...nel.org>, LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org> Subject: Re: [PATCH 1/3 v2] perf/amd/uncore: Prepare L3 thread mask code for Family 19h support On Wed, Mar 18, 2020 at 1:43 PM Peter Zijlstra <peterz@...radead.org> wrote: > > On Wed, Mar 18, 2020 at 09:46:41AM -0500, Kim Phillips wrote: > > > > But this does not work with the cpumask programmed for the amd_l3 PMU. This mask > > > shows, as it should, one CPU/CCX. So that means that when I do: > > > > > > $ perf stat -a amd_l3/event=llc_event/ > > > > > > This only collects on the CPUs listed in the cpumask: 0,4,8,12 .... > > > That means that L3 events generated by the other CPUs on the CCX are > > > not monitored. > > > I can easily see the problem by pinning a memory bound program to > > > CPU64, for instance. > > > > Right, the higher level code calls the driver with a single cpu==0 > > call if the perf tool is invoked with a simple -a style system-wide. No, it does not. With -a, when -C is not passed, the perf tool picks up the cpumask for the PMU from sysfs: $ cat /proc/sys/devices/amd_l3/cpumask You can easily verify this by running: strace -etrace=perf_event_open perf stat -a -e amd_l3/event=0x00/. This is the default common mode. The problem is that here to get any meaningful result, you need to force a -C. The CPU in the cpumask is just the CPU to which to attach the event in order to access the correct uncore PMU. Here, you have one CPU per CCX which is expected and perfectly fine. The thread_mask is a hardware filter on the uncore L3 PMU. If you set by default the thread_mask to 0xff, then you obtain a full system view with a simple -a, or per socket with --per-socket. So we need to find a way to make this common case work properly first. Expecting the users to know that for some amd_l3 events you need to force -C 0-255 is not practical. I also think that forcing the cpumask to 0-255 is not right solution. This is not how this is done for any other uncore PMU I know of and some do have the thread filter, such as the Skylake CHA. > > If the tool is invoked with supplemental switches to -a, like -C 0-255, > > and -A, the driver gets called multiple times with all the unique cpu > > values. The latter is the expected invocation style when measuring > > a benchmark pinned on a subset of cpus, i.e., when evaluating > > the driver, and is the more deterministic behaviour for the driver > > to have, given it cannot tell the difference otherwise. > > That seems to suggest it is all horribly broken.
Powered by blists - more mailing lists