linux-kernel - Re: [PATCH v2 3/6] perf pmu: Add support for event.cpus files in sysfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP-5=fUfoMZ0HjCNoJe6hgEMi5ciY+LqFjBbLzfiZgO6dioshA@mail.gmail.com>
Date: Thu, 18 Jul 2024 08:39:21 -0700
From: Ian Rogers <irogers@...gle.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, 
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Bjorn Helgaas <bhelgaas@...gle.com>, 
	Jonathan Corbet <corbet@....net>, James Clark <james.clark@....com>, 
	Ravi Bangoria <ravi.bangoria@....com>, Dominique Martinet <asmadeus@...ewreck.org>, 
	linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>, ananth.narayan@....com, gautham.shenoy@....com, 
	kprateek.nayak@....com, sandipan.das@....com
Subject: Re: [PATCH v2 3/6] perf pmu: Add support for event.cpus files in sysfs

On Thu, Jul 18, 2024 at 7:33 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>
>
>
> On 2024-07-17 8:30 p.m., Ian Rogers wrote:
> > If an event file exists in sysfs, check if a event.cpus file exists
> > and read a perf_cpu_map from it if it does. This allows particular
> > events to have a different set of CPUs compared to the PMU.
> >
> > One scenario where this could be useful is when a PMU is set up with a
> > cpumask/events per SMT thread but some events record for all SMT
> > threads. Programming an event on each SMT thread will cause
> > unnecessary counters to be programmed and the aggregate value to be
> > too large.
>
> If I understand the scenario correctly, I think the issue should have
> been addressed since ICX, with the introduction of  the "SMT-aware
> events". Is there a specific event which still has the issue on newer
> platforms?

Nothing comes to mind, but it is there on popular machines like Skylake.

> >
> > Another scenario where this could be useful if when a PMU has
> > historically had a cpumask at the package level, but now newer per
> > die, core or CPU information is available.
>
> The traditional way to support new counters with a different scope is to
> add a new PMU.
>
> >
> > Additional context for the motivation is in these patches and
> > conversation:
> > https://lore.kernel.org/lkml/20240711102436.4432-1-Dhananjay.Ugwekar@amd.com/
>
> I don't see the advantages of the new event.cpus way.
> The aggregation should be the same.

Agreed. My concern is that we may end up with a pattern of
<pmu>_per_pkg, <pmu>_per_core, <pmu>_per_cpu, <pmu>_per_l3, etc. just
for the sake of the cpumask.

> The RAPL counters are free-running counters. So there is no difference
> whether grouping or not grouping.

Should the RAPL counters return true for perf_pmu__is_software? In the
tool we currently return false and won't allow grouping, these events
with other hardware events. The intent in perf_pmu__is_software was to
emulate the way the kernel allows/disallows groups - software context
events can be in a hardware context but otherwise I don't believe the
grouping is allowed.

> But it makes the kernel driver complex, since it has to maintain at
> least two different cpumasks.

Two drivers each maintaining a cpumask or 1 driver maintaining 2
cpumasks, it seems like there is chance for code reuse in both cases.
I'm not seeing how it adds to complexity particularly.

> The tool becomes complex either, since it has to take care of the
> per-event CPU override case.

I'm not sure I agree with this. What we need for perf_event_open is a
perf_event_attr, we dress these up as evsels which also have the
ability to be for >1 CPU or thread. Longer term I think evsels can
also have >1 PMU, for the wildcard cases like uncore memory
controllers - this would remove the need for resorting evsels except
for topdown events which have thrown us a giant bundle of challenges.
Anyway, so an evsel is perf_event_attr information paired with CPU
information, it makes sense to me that the parser should do this
pairing. What's harder in the evsel/evlist setup is trying to fix CPU
maps up not in parsing, like with __perf_evlist__propagate_maps where
the parsing is trying to leave crumbs around (like system_wide,
has_user_cpus, is_pmu_core) so the map propagation works properly.

> The json file must also be updated to add a
> new field cpumask.

Yeah, I don't like this as it means we end up putting CPU information
into the json that isn't the same for every CPU variant of the same
architecture model. Maybe we can have some kind of "scope" enum value
in the json and then when the scope differs from the PMU's, core scope
vs the PMU's hyperthread scope, then in the tool we can figure out the
cpumask from the topology in sysfs. Maybe we should just always use
the topology and get rid of cpumask files in sysfs, replacing them
with "scope" files. Will Deacon pushed back on having ARM PMUs
supporting hot plugging
(https://lore.kernel.org/lkml/20240701142222.GA2691@willie-the-truck/)
where the main thing hot plugging handler needs to maintain is set the
cpumask.

Thanks,
Ian