[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250815105605.GA3245006@noisy.programming.kicks-ass.net>
Date: Fri, 15 Aug 2025 12:56:05 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Liang, Kan" <kan.liang@...ux.intel.com>
Cc: Yunseong Kim <ysk@...lloc.com>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Will Deacon <will@...nel.org>, Yeoreum Yun <yeoreum.yun@....com>,
Austin Kim <austindh.kim@...il.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller@...glegroups.com
Subject: Re: [PATCH v3] perf: Avoid undefined behavior from stopping/starting
inactive events
On Tue, Aug 12, 2025 at 04:51:28PM -0700, Liang, Kan wrote:
>
>
> On 2025-08-12 11:10 a.m., Yunseong Kim wrote:
> > Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can
> > leave event->hw.idx at -1. When PMU drivers later attempt to use this
> > negative index as a shift exponent in bitwise operations, it leads to UBSAN
> > shift-out-of-bounds reports.
> >
> > The issue is a logical flaw in how event groups handle throttling when some
> > members are intentionally disabled. Based on the analysis and the
> > reproducer provided by Mark Rutland (this issue on both arm64 and x86-64).
> >
> > The scenario unfolds as follows:
> >
> > 1. A group leader event is configured with a very aggressive sampling
> > period (e.g., sample_period = 1). This causes frequent interrupts and
> > triggers the throttling mechanism.
> > 2. A child event in the same group is created in a disabled state
> > (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF.
> > Since it hasn't been scheduled onto the PMU, its event->hw.idx remains
> > initialized at -1.
> > 3. When throttling occurs, perf_event_throttle_group() and later
> > perf_event_unthrottle_group() iterate through all siblings, including
> > the disabled child event.
> > 4. perf_event_throttle()/unthrottle() are called on this inactive child
> > event, which then call event->pmu->start()/stop().
> > 5. The PMU driver receives the event with hw.idx == -1 and attempts to
> > use it as a shift exponent. e.g., in macros like PMCNTENSET(idx),
> > leading to the UBSAN report.
> >
> > The throttling mechanism attempts to start/stop events that are not
> > actively scheduled on the hardware.
> >
> > Move the state check into perf_event_throttle()/perf_event_unthrottle() so
> > that inactive events are skipped entirely. This ensures only active events
> > with a valid hw.idx are processed, preventing undefined behavior and
> > silencing UBSAN warnings. The corrected check ensures true before
> > proceeding with PMU operations.
> >
> > The problem can be reproduced with the syzkaller reproducer:
> > Link: https://lore.kernel.org/lkml/714b7ba2-693e-42e4-bce4-feef2a5e7613@kzalloc.com/
> >
> > Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> > Cc: Mark Rutland <mark.rutland@....com>
> > Signed-off-by: Yunseong Kim <ysk@...lloc.com>
>
> Thanks for the fix.
>
> Reviewed-by: Kan Liang <kan.liang@...ux.intel.com>
Thanks both!
Powered by blists - more mailing lists