[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU0-QDMP-VG3O1qBvJ8uzHHYCQ8j1Vrzy9a0YUk=UMvHw@mail.gmail.com>
Date: Wed, 27 Aug 2025 08:15:29 -0700
From: Ian Rogers <irogers@...gle.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Robin Murphy <robin.murphy@....com>, Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
will@...nel.org, acme@...nel.org, namhyung@...nel.org,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org, adrian.hunter@...el.com,
kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
imx@...ts.linux.dev, linux-csky@...r.kernel.org, loongarch@...ts.linux.dev,
linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
sparclinux@...r.kernel.org, linux-pm@...r.kernel.org,
linux-rockchip@...ts.infradead.org, dmaengine@...r.kernel.org,
linux-fpga@...r.kernel.org, amd-gfx@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
intel-xe@...ts.freedesktop.org, coresight@...ts.linaro.org,
iommu@...ts.linux.dev, linux-amlogic@...ts.infradead.org,
linux-cxl@...r.kernel.org, linux-arm-msm@...r.kernel.org,
linux-riscv@...ts.infradead.org
Subject: Re: [PATCH 12/19] perf: Ignore event state for group validation
On Wed, Aug 27, 2025 at 1:18 AM Mark Rutland <mark.rutland@....com> wrote:
>
> On Tue, Aug 26, 2025 at 11:48:48AM -0700, Ian Rogers wrote:
> > On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.murphy@....com> wrote:
> > >
> > > On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> > > > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> > > >> It may have been different long ago, but today it seems wrong for these
> > > >> drivers to skip counting disabled sibling events in group validation,
> > > >> given that perf_event_enable() could make them schedulable again, and
> > > >> thus increase the effective size of the group later. Conversely, if a
> > > >> sibling event is truly dead then it stands to reason that the whole
> > > >> group is dead, so it's not worth going to any special effort to try to
> > > >> squeeze in a new event that's never going to run anyway. Thus, we can
> > > >> simply remove all these checks.
> > > >
> > > > So currently you can do sort of a manual event rotation inside an
> > > > over-sized group and have it work.
> > > >
> > > > I'm not sure if anybody actually does this, but its possible.
> > > >
> > > > Eg. on a PMU that supports only 4 counters, create a group of 5 and
> > > > periodically cycle which of the 5 events is off.
> >
> > I'm not sure this is true, I thought this would fail in the
> > perf_event_open when adding the 5th event and there being insufficient
> > counters for the group.
>
> We're talking specifically about cases where the logic in a pmu's
> pmu::event_init() callback doesn't count events in specific states, and
> hence the 5th even doesn't get rejected when it is initialised.
>
> For example, in arch/x86/events/core.c, validate_group() uses
> collect_events(), which has:
>
> for_each_sibling_event(event, leader) {
> if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
> continue;
>
> if (collect_event(cpuc, event, max_count, n))
> return -EINVAL;
>
> n++;
> }
>
> ... and so where an event's state is <= PERF_EVENT_STATE_OFF at init
> time, that event is not counted to see if it fits into HW counters.
Hmm.. Thinking out loud. So it looked like perf with weak groups could
be broken then:
```
$ sudo perf stat -vv -e '{instructions,cycles}:W' true
...
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000001
(cpu_core/PERF_COUNT_HW_INSTRUCTIONS/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 3337764 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x400000000
(cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
inherit 1
------------------------------------------------------------
sys_perf_event_open: pid 3337764 cpu -1 group_fd 5 flags 0x8 = 7
...
```
Note, the group leader (instructions) is disabled because of:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat.c?h=perf-tools-next#n761
```
/*
* Disabling all counters initially, they will be enabled
* either manually by us or by kernel via enable_on_exec
* set later.
*/
if (evsel__is_group_leader(evsel)) {
attr->disabled = 1;
```
but the checking of being disabled (PERF_EVENT_STATE_OFF) is only done
on siblings in the code you show above. So yes, you can disable the
group events to allow the perf_event_open to succeed but not on the
leader which is always checked (no PERF_EVENT_STATE_OFF check):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/arch/x86/events/core.c?h=perf-tools-next#n1204
```
if (is_x86_event(leader)) {
if (collect_event(cpuc, leader, max_count, n))
return -EINVAL;
```
Thanks,
Ian
Powered by blists - more mailing lists