linux-kernel - Re: [PATCH 12/19] perf: Ignore event state for group validation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fU0-QDMP-VG3O1qBvJ8uzHHYCQ8j1Vrzy9a0YUk=UMvHw@mail.gmail.com>
Date: Wed, 27 Aug 2025 08:15:29 -0700
From: Ian Rogers <irogers@...gle.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Robin Murphy <robin.murphy@....com>, Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com, 
	will@...nel.org, acme@...nel.org, namhyung@...nel.org, 
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org, adrian.hunter@...el.com, 
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-alpha@...r.kernel.org, 
	linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org, 
	imx@...ts.linux.dev, linux-csky@...r.kernel.org, loongarch@...ts.linux.dev, 
	linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, 
	linux-s390@...r.kernel.org, linux-sh@...r.kernel.org, 
	sparclinux@...r.kernel.org, linux-pm@...r.kernel.org, 
	linux-rockchip@...ts.infradead.org, dmaengine@...r.kernel.org, 
	linux-fpga@...r.kernel.org, amd-gfx@...ts.freedesktop.org, 
	dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org, 
	intel-xe@...ts.freedesktop.org, coresight@...ts.linaro.org, 
	iommu@...ts.linux.dev, linux-amlogic@...ts.infradead.org, 
	linux-cxl@...r.kernel.org, linux-arm-msm@...r.kernel.org, 
	linux-riscv@...ts.infradead.org
Subject: Re: [PATCH 12/19] perf: Ignore event state for group validation

On Wed, Aug 27, 2025 at 1:18 AM Mark Rutland <mark.rutland@....com> wrote:
>
> On Tue, Aug 26, 2025 at 11:48:48AM -0700, Ian Rogers wrote:
> > On Tue, Aug 26, 2025 at 8:32 AM Robin Murphy <robin.murphy@....com> wrote:
> > >
> > > On 2025-08-26 2:03 pm, Peter Zijlstra wrote:
> > > > On Wed, Aug 13, 2025 at 06:01:04PM +0100, Robin Murphy wrote:
> > > >> It may have been different long ago, but today it seems wrong for these
> > > >> drivers to skip counting disabled sibling events in group validation,
> > > >> given that perf_event_enable() could make them schedulable again, and
> > > >> thus increase the effective size of the group later. Conversely, if a
> > > >> sibling event is truly dead then it stands to reason that the whole
> > > >> group is dead, so it's not worth going to any special effort to try to
> > > >> squeeze in a new event that's never going to run anyway. Thus, we can
> > > >> simply remove all these checks.
> > > >
> > > > So currently you can do sort of a manual event rotation inside an
> > > > over-sized group and have it work.
> > > >
> > > > I'm not sure if anybody actually does this, but its possible.
> > > >
> > > > Eg. on a PMU that supports only 4 counters, create a group of 5 and
> > > > periodically cycle which of the 5 events is off.
> >
> > I'm not sure this is true, I thought this would fail in the
> > perf_event_open when adding the 5th event and there being insufficient
> > counters for the group.
>
> We're talking specifically about cases where the logic in a pmu's
> pmu::event_init() callback doesn't count events in specific states, and
> hence the 5th even doesn't get rejected when it is initialised.
>
> For example, in arch/x86/events/core.c, validate_group() uses
> collect_events(), which has:
>
>         for_each_sibling_event(event, leader) {
>                 if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
>                         continue;
>
>                 if (collect_event(cpuc, event, max_count, n))
>                         return -EINVAL;
>
>                 n++;
>         }
>
> ... and so where an event's state is <= PERF_EVENT_STATE_OFF at init
> time, that event is not counted to see if it fits into HW counters.

Hmm.. Thinking out loud. So it looked like perf with weak groups could
be broken then:
```
$ sudo perf stat -vv -e '{instructions,cycles}:W' true
...
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x400000001
(cpu_core/PERF_COUNT_HW_INSTRUCTIONS/)
 sample_type                      IDENTIFIER
 read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
 disabled                         1
 inherit                          1
 enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 3337764  cpu -1  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
 type                             0 (PERF_TYPE_HARDWARE)
 size                             136
 config                           0x400000000
(cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
 sample_type                      IDENTIFIER
 read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
 inherit                          1
------------------------------------------------------------
sys_perf_event_open: pid 3337764  cpu -1  group_fd 5  flags 0x8 = 7
...
```
Note, the group leader (instructions) is disabled because of:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat.c?h=perf-tools-next#n761
```
/*
* Disabling all counters initially, they will be enabled
* either manually by us or by kernel via enable_on_exec
* set later.
*/
if (evsel__is_group_leader(evsel)) {
        attr->disabled = 1;
```
but the checking of being disabled (PERF_EVENT_STATE_OFF) is only done
on siblings in the code you show above. So yes, you can disable the
group events to allow the perf_event_open to succeed but not on the
leader which is always checked (no PERF_EVENT_STATE_OFF check):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/arch/x86/events/core.c?h=perf-tools-next#n1204
```
if (is_x86_event(leader)) {
        if (collect_event(cpuc, leader, max_count, n))
                return -EINVAL;
```

Thanks,
Ian