lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABPqkBQn7VRWQu-JU9BfE8y3g_uqhKEtN6GYuVuKs6QTGPHzgw@mail.gmail.com>
Date:   Wed, 18 Mar 2020 14:26:16 -0700
From:   Stephane Eranian <eranian@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Kim Phillips <kim.phillips@....com>,
        Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Michael Petlan <mpetlan@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>
Subject: Re: [PATCH 1/3 v2] perf/amd/uncore: Prepare L3 thread mask code for
 Family 19h support

On Wed, Mar 18, 2020 at 1:43 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Wed, Mar 18, 2020 at 09:46:41AM -0500, Kim Phillips wrote:
>
> > > But this does not work with the cpumask programmed for the amd_l3 PMU. This mask
> > > shows, as it should, one CPU/CCX. So that means that when I do:
> > >
> > > $ perf stat -a amd_l3/event=llc_event/
> > >
> > > This only collects on the CPUs listed in the cpumask: 0,4,8,12 ....
> > > That means that L3 events generated by the other CPUs on the CCX are
> > > not monitored.
> > > I can easily see the problem by pinning a memory bound program to
> > > CPU64, for instance.
> >
> > Right, the higher level code calls the driver with a single cpu==0
> > call if the perf tool is invoked with a simple -a style system-wide.

No, it does not.

With -a, when -C is not passed, the perf tool picks up the cpumask for
the PMU from sysfs:
$ cat /proc/sys/devices/amd_l3/cpumask

You can easily verify this by running: strace -etrace=perf_event_open
perf stat -a -e amd_l3/event=0x00/.
This is the default common mode.

The problem is that here to get any meaningful result, you need to force a -C.
The CPU in the cpumask is just the CPU to which to attach the event in
order to access the correct uncore PMU.
Here, you have one CPU per CCX which is expected and perfectly fine.

The thread_mask is a hardware filter on the uncore L3 PMU. If you set
by default the thread_mask to 0xff, then
you obtain a full system view with a simple -a, or per socket with
--per-socket. So we need to find a way to
make this common case work properly first. Expecting the users to know
that for some amd_l3 events you need
to force -C 0-255 is not practical. I also think that forcing the
cpumask to 0-255 is not right solution. This is not how
this is done for any other uncore PMU I know of and some do have the
thread filter, such as the Skylake CHA.



> > If the tool is invoked with supplemental switches to -a, like -C 0-255,
> > and -A, the driver gets called multiple times with all the unique cpu
> > values.  The latter is the expected invocation style when measuring
> > a benchmark pinned on a subset of cpus, i.e., when evaluating
> > the driver, and is the more deterministic behaviour for the driver
> > to have, given it cannot tell the difference otherwise.
>
> That seems to suggest it is all horribly broken.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ