lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2581de5a-969b-93c7-0565-2eef51717900@amd.com>
Date:   Wed, 18 Mar 2020 09:46:41 -0500
From:   Kim Phillips <kim.phillips@....com>
To:     Stephane Eranian <eranian@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Jiri Olsa <jolsa@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Michael Petlan <mpetlan@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, x86 <x86@...nel.org>
Subject: Re: [PATCH 1/3 v2] perf/amd/uncore: Prepare L3 thread mask code for
 Family 19h support

On 3/17/20 9:09 PM, Stephane Eranian wrote:
> On Fri, Mar 13, 2020 at 4:10 PM Kim Phillips <kim.phillips@....com> wrote:
>> +++ b/arch/x86/events/amd/uncore.c
>> @@ -180,6 +180,20 @@ static void amd_uncore_del(struct perf_event *event, int flags)
>>         hwc->idx = -1;
>>  }
>>
>> +/*
>> + * Convert logical cpu number to L3 PMC Config ThreadMask format
>> + */
>> +static u64 l3_thread_slice_mask(int cpu)
>> +{
>> +       int thread = 2 * (cpu_data(cpu).cpu_core_id % 4);
>> +
>> +       if (smp_num_siblings > 1)
>> +               thread += cpu_data(cpu).apicid & 1;
>> +
>> +       return (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
>> +               AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
>> +}
>> +
>>  static int amd_uncore_event_init(struct perf_event *event)
>>  {
>>         struct amd_uncore *uncore;
>> @@ -209,15 +223,8 @@ static int amd_uncore_event_init(struct perf_event *event)
>>          * SliceMask and ThreadMask need to be set for certain L3 events in
>>          * Family 17h. For other events, the two fields do not affect the count.
>>          */
>> -       if (l3_mask && is_llc_event(event)) {
>> -               int thread = 2 * (cpu_data(event->cpu).cpu_core_id % 4);
>> -
>> -               if (smp_num_siblings > 1)
>> -                       thread += cpu_data(event->cpu).apicid & 1;
>> -
>> -               hwc->config |= (1ULL << (AMD64_L3_THREAD_SHIFT + thread) &
>> -                               AMD64_L3_THREAD_MASK) | AMD64_L3_SLICE_MASK;
>> -       }
>> +       if (l3_mask && is_llc_event(event))
>> +               hwc->config |= l3_thread_slice_mask(event->cpu);
>>
> By looking at this code, I realized that even on Zen2 this is wrong.
> It does not work well.
> You are basically saying that the L3 event is tied to the CPU the
> event is programmed to.
> But this does not work with the cpumask programmed for the amd_l3 PMU. This mask
> shows, as it should, one CPU/CCX. So that means that when I do:
> 
> $ perf stat -a amd_l3/event=llc_event/
> 
> This only collects on the CPUs listed in the cpumask: 0,4,8,12 ....
> That means that L3 events generated by the other CPUs on the CCX are
> not monitored.
> I can easily see the problem by pinning a memory bound program to
> CPU64, for instance.

Right, the higher level code calls the driver with a single cpu==0
call if the perf tool is invoked with a simple -a style system-wide.
If the tool is invoked with supplemental switches to -a, like -C 0-255,
and -A, the driver gets called multiple times with all the unique cpu
values.  The latter is the expected invocation style when measuring
a benchmark pinned on a subset of cpus, i.e., when evaluating
the driver, and is the more deterministic behaviour for the driver
to have, given it cannot tell the difference otherwise.

> I think the thread mask should be exposed to the user. If not
> specified, then set the mask to
> cover all CPUs of the CCX. That way you can pick and choose what you
> want. And with one event/CCX
> you can monitor  for all CPUs. I can send a patch that does that.

Do you mean something that will allow the user to do something
like this?:

perf stat -a amd_l3/event=llc_event,core=X,thread_mask={1,2,3}/

Wouldn't users rather specify cpus using -C etc.?

> With what you have now, you have to force the list of CPUs with -C to
> work around
> the cpumask. And forcing the cpumask to 0-255 does not make sense because not
> all L3 events necessarily need the L3 mask, so you don't want to program them on
> all CPUs especially with 8 cpus/CCX and only 6 counters.

Is it not possible for those to be run in separate invocations
that use the simple system-wide case, e.g., -a?

How would adding core=X,thread_mask={1,2,3} specification
change the -C invocation behaviour?

I thought of having the driver set all CPUs in the threadmask
if invoked with a cpu == 0, but that means one cannot specify
-C 0,4,8, etc.

Thanks,

Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ