[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <922f69f6-e290-46f6-af6f-5a71e4508cf0@linux.intel.com>
Date: Tue, 12 Aug 2025 16:51:28 -0700
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Yunseong Kim <ysk@...lloc.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>, Will Deacon <will@...nel.org>,
Yeoreum Yun <yeoreum.yun@....com>, Austin Kim <austindh.kim@...il.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller@...glegroups.com
Subject: Re: [PATCH v3] perf: Avoid undefined behavior from stopping/starting
inactive events
On 2025-08-12 11:10 a.m., Yunseong Kim wrote:
> Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can
> leave event->hw.idx at -1. When PMU drivers later attempt to use this
> negative index as a shift exponent in bitwise operations, it leads to UBSAN
> shift-out-of-bounds reports.
>
> The issue is a logical flaw in how event groups handle throttling when some
> members are intentionally disabled. Based on the analysis and the
> reproducer provided by Mark Rutland (this issue on both arm64 and x86-64).
>
> The scenario unfolds as follows:
>
> 1. A group leader event is configured with a very aggressive sampling
> period (e.g., sample_period = 1). This causes frequent interrupts and
> triggers the throttling mechanism.
> 2. A child event in the same group is created in a disabled state
> (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF.
> Since it hasn't been scheduled onto the PMU, its event->hw.idx remains
> initialized at -1.
> 3. When throttling occurs, perf_event_throttle_group() and later
> perf_event_unthrottle_group() iterate through all siblings, including
> the disabled child event.
> 4. perf_event_throttle()/unthrottle() are called on this inactive child
> event, which then call event->pmu->start()/stop().
> 5. The PMU driver receives the event with hw.idx == -1 and attempts to
> use it as a shift exponent. e.g., in macros like PMCNTENSET(idx),
> leading to the UBSAN report.
>
> The throttling mechanism attempts to start/stop events that are not
> actively scheduled on the hardware.
>
> Move the state check into perf_event_throttle()/perf_event_unthrottle() so
> that inactive events are skipped entirely. This ensures only active events
> with a valid hw.idx are processed, preventing undefined behavior and
> silencing UBSAN warnings. The corrected check ensures true before
> proceeding with PMU operations.
>
> The problem can be reproduced with the syzkaller reproducer:
> Link: https://lore.kernel.org/lkml/714b7ba2-693e-42e4-bce4-feef2a5e7613@kzalloc.com/
>
> Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> Cc: Mark Rutland <mark.rutland@....com>
> Signed-off-by: Yunseong Kim <ysk@...lloc.com>
Thanks for the fix.
Reviewed-by: Kan Liang <kan.liang@...ux.intel.com>
Thanks,
Kan
> ---
> kernel/events/core.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8060c2857bb2..872122e074e5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2665,6 +2665,9 @@ static void perf_log_itrace_start(struct perf_event *event);
>
> static void perf_event_unthrottle(struct perf_event *event, bool start)
> {
> + if (event->state != PERF_EVENT_STATE_ACTIVE)
> + return;
> +
> event->hw.interrupts = 0;
> if (start)
> event->pmu->start(event, 0);
> @@ -2674,6 +2677,9 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>
> static void perf_event_throttle(struct perf_event *event)
> {
> + if (event->state != PERF_EVENT_STATE_ACTIVE)
> + return;
> +
> event->hw.interrupts = MAX_INTERRUPTS;
> event->pmu->stop(event, 0);
> if (event == event->group_leader)
Powered by blists - more mailing lists