linux-kernel - Re: [PATCH] perf: arm_pmuv3: Fix kernel panic on UBSAN from negative hw.idx in armv8pmu_enable/disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aIEePonPatjNrJVk@J2N7QTR9R3>
Date: Wed, 23 Jul 2025 18:39:10 +0100
From: Mark Rutland <mark.rutland@....com>
To: Yunseong Kim <ysk@...lloc.com>
Cc: Will Deacon <will@...nel.org>, Austin Kim <austindh.kim@...il.com>,
	Michelle Jin <shjy180909@...il.com>,
	linux-arm-kernel@...ts.infradead.org,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	Yeoreum Yun <yeoreum.yun@....com>, syzkaller@...glegroups.com,
	Kan Liang <kan.liang@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH] perf: arm_pmuv3: Fix kernel panic on UBSAN from negative
 hw.idx in armv8pmu_enable/disable_event()

On Wed, Jul 23, 2025 at 01:35:31PM +0100, Mark Rutland wrote:
> [ dropping Hemendra, since he doens't need to be spammed with ML traffic ]
> 
> On Wed, Jul 23, 2025 at 10:44:03AM +0000, Yunseong Kim wrote:
> > When 'event->hw.idx' was negative in armv8pmu_enable/disable_event().
> > 
> >   UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:716:25
> >   shift exponent -1 is negative
> > 
> >   UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:658:13
> >   shift exponent -1 is negative
> > 
> > This occurred because a perf_event could reach armv8pmu event with a
> > negative idx, typically when a valid counter could not be allocated.
> 
> These functions are never supposed to be called for an event with a
> negative idx. For that to happen there must either be an earlier bug (at
> the time of pmu::add()) or there's a concurrency bug.

AFAICT this is a result of the group throttling logic introduced in
commit:

  9734e25fbf5ae68e ("perf: Fix the throttle logic for a group")

That doesn't take into account that sibling events could have
event->state <= PERF_EVENT_STATE_OFF, e.g. by virtue of
perf_event_attr::disabled. For those events, event_sched_in() won't
initialise the event, e.g. won't call event->pmu->add().

Thus when perf_event_throttle_group() and perf_event_unthrottle_group()
iterate over events with:

	for_each_sibling_event(sibling, leader)
		perf_event_[un]throttle(sibling, ...);

... perf_event_[un]throttle() will call event->pmu->stop() and
event->pmu->start() for those disabled events, resulting in the UBSAN
splat above since they don't have a hw idx (which is assigned by
event->pmu->add()).

I think the event's state needs to be taken into account somewhere
during throttling. Given the sample of event_sched_in(), I'd assume that
should be in the core code, rather than in each architecture's
pmu::{start,stop}().

I can reproduce this locally with:

| #include <stdio.h>
| #include <stdlib.h>
| #include <unistd.h>
| 
| #include <sys/syscall.h>
| #include <sys/types.h>
| 
| #include <linux/perf_event.h>
| 
| static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
|                            int group_fd, unsigned long flags)
| {
|         return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
| }
| 
| struct perf_event_attr attr_parent = {
|         .type = PERF_TYPE_HARDWARE,
|         .size = sizeof(attr_parent),
|         .config = PERF_COUNT_HW_CPU_CYCLES,
|         .sample_period = 1,
|         .exclude_kernel = 1,
| };
| 
| struct perf_event_attr attr_child = {
|         .type = PERF_TYPE_HARDWARE,
|         .size = sizeof(attr_child),
|         .config = PERF_COUNT_HW_CPU_CYCLES,
|         .exclude_kernel = 1,
|         .disabled = 1,
| };
| 
| int main(int argc, char *argv[])
| {
|         int parent, child;
| 
|         parent = perf_event_open(&attr_parent, 0, -1, -1, 0);
|         if (parent < 0) {
|                 fprintf(stderr, "Unable to create event: %d\n", parent);
|                 exit (-1);
|         }
| 
|         child = perf_event_open(&attr_child, 0, -1, parent, 0);
|         if (child < 0) {
|                 fprintf(stderr, "Unable to create event: %d\n", child);
|                 exit (-1);
|         }
| 
|         for (;;) {
|                 asm("" ::: "memory");
|         }
| 
|         return 0;
| }

I've kept the context below for anyone new to this thread, but hopefully
the above is clear?

Mark.

> > This issue was observed when running KVM on Radxa's Orion6 platform.
> 
> Do you mean that you see this on the host, or in a guest?
> 
> Are you using pseudo-NMI?
> 
> > The issue was previously guarded indirectly by armv8pmu_event_is_chained(),
> > which internally warned and returned false for idx < 0. But since the
> > commit 29227d6ea157 ("arm64: perf: Clean up enable/disable calls"), this
> > check was removed.
> 
> The warning there was because this case *should not happen*, and
> returning false was a way of minimizing the risk of a crash before the
> warning was logged. 
> 
> I don't think that armv8pmu_event_is_chained() would avoid the UBSAN
> splat. Prior to commit 29227d6ea157 we had:
> 
> | static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> | {
> |         struct perf_event_attr *attr = &event->attr;
> |         int idx = event->hw.idx;
> | 
> | 	...
> | 
> |         if (!kvm_pmu_counter_deferred(attr)) {
> |                 armv8pmu_enable_counter(idx);
> |                 if (armv8pmu_event_is_chained(event))
> |                         armv8pmu_enable_counter(idx - 1);
> |         }
> | }
> 
> Note the first call to armv8pmu_enable_counter(idx), so this wouldn't
> help for an event with event->hw.idx==-1, and the only other way we
> could get here is with a chained event with event->hw.idx==0, which is
> not valid.
> 
> > To prevent undefined behavior, add an explicit guard to early return from
> > armv8pmu event if hw.idx < 0, 
> 
> That is not a correct fix, and simply hides the real bug. It should not
> be possible to reach this code when hw.idx < 0, and idx should be >= 0
> whenever pmu::add() succeeds.
> 
> > similar to handling in other PMU drivers.
> > (e.g. intel_pmu_disable_event() on arch/x86/events/intel/core.c)
> 
> I think what you're saying here is that intel_pmu_disable_event() will
> pr_warn() and return early in this case. As above, that is because this
> case is not expected to occur, and indicates a bug elsewhere.
> 
> > $ ./syz-execprog -executor=./syz-executor -repeat=0  -sandbox=none \
> >   -disable=binfmt_misc,cgroups,close_fds,devlink_pci,ieee802154,net_dev,net_reset,nic_vf,swap,sysctl,tun,usb,vhci,wifi \
> >   -procs=8 perf.syz
> > 
> > r0 = perf_event_open(&(0x7f0000000240)={
> >         0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         @perf_config_ext
> >     }, 0x0, 0x0, 0xffffffffffffffff, 0x0)
> > 
> > perf_event_open(&(0x7f0000000280)={
> >         0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         @perf_config_ext
> >     }, 0x0, 0x0, r0, 0x0)
> > 
> > perf_event_open(&(0x7f0000000540)={
> >         0x3, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4,
> >         @perf_bp={0x0, 0x2}, 0x20, 0x0, 0x0, 0x0, 0x2,
> >         0x0, 0x0, 0x0, 0x0, 0x0, 0x81
> >     }, 0x0, 0x0, r0, 0x0)
> 
> This isn't all that helpful for reproducing the issue. Are the later
> lines the contents of 'perf.syz'? My local build of syz-execprog can't
> seem to parse this and prints help/usage.
> 
> Has your syzkaller instance managed to generate a C reproducer that you
> can share?
> 
> It should be possible to manually build a test from the above, but
> that's rather tedious.
> 
> > ------------[ cut here ]------------
> > UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:716:25
> > shift exponent -1 is negative
> > CPU: 0 UID: 0 PID: 8405 Comm: syz.3.19 Tainted: G        W           6.16.0-rc2-g5982a539cdce #3 PREEMPT
> > Tainted: [W]=WARN
> > Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> > Call trace:
> >  show_stack+0x2c/0x3c (C)
> >  __dump_stack+0x30/0x40
> >  dump_stack_lvl+0xd8/0x12c
> >  dump_stack+0x1c/0x28
> >  ubsan_epilogue+0x14/0x48
> >  __ubsan_handle_shift_out_of_bounds+0x2b0/0x34c
> >  armv8pmu_enable_event+0x3c4/0x4b0
> >  armpmu_start+0xc4/0x118
> >  perf_event_unthrottle_group+0x3a8/0x50c
> >  perf_adjust_freq_unthr_events+0x2f4/0x578
> >  perf_adjust_freq_unthr_context+0x278/0x46c
> >  perf_event_task_tick+0x394/0x5b0
> 
> AFAICT perf_event_task_tick() is called with IRQs masked, and
> perf_adjust_freq_unthr_context() disables the PMU for the duration of
> the state manipulation, so this shouldn't be able to race with anything
> that's using appropriate IRQ masking.
> 
> This might be able to race with a pNMI though, and it looks like we're
> not entirely robust.
> 
> >  sched_tick+0x314/0x6cc
> >  update_process_times+0x374/0x4b0
> >  tick_nohz_handler+0x334/0x480
> >  __hrtimer_run_queues+0x3ec/0xb78
> >  hrtimer_interrupt+0x2b8/0xb50
> >  arch_timer_handler_virt+0x74/0x88
> >  handle_percpu_devid_irq+0x174/0x308
> >  generic_handle_domain_irq+0xe0/0x140
> >  gic_handle_irq+0x6c/0x190
> >  call_on_irq_stack+0x24/0x30
> >  do_interrupt_handler+0xd4/0x138
> >  el1_interrupt+0x34/0x54
> >  el1h_64_irq_handler+0x18/0x24
> >  el1h_64_irq+0x6c/0x70
> >  generic_exec_single+0x2dc/0x304 (P)
> >  smp_call_function_single+0x308/0x530
> >  perf_install_in_context+0x4a0/0x798
> >  __arm64_sys_perf_event_open+0x184c/0x1d50
> >  invoke_syscall+0x98/0x2b8
> >  el0_svc_common+0x130/0x23c
> >  do_el0_svc+0x48/0x58
> >  el0_svc+0x58/0x17c
> >  el0t_64_sync_handler+0x78/0x108
> >  el0t_64_sync+0x198/0x19c
> > ---[ end trace ]---
> > ------------[ cut here ]------------
> > UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:658:13
> > shift exponent -1 is negative
> > CPU: 0 UID: 0 PID: 8006 Comm: syz.0.19 Not tainted 6.16.0-rc2-g5982a539cdce #3 PREEMPT
> > Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> > Call trace:
> >  show_stack+0x2c/0x3c (C)
> >  __dump_stack+0x30/0x40
> >  dump_stack_lvl+0xd8/0x12c
> >  dump_stack+0x1c/0x28
> >  ubsan_epilogue+0x14/0x48
> >  __ubsan_handle_shift_out_of_bounds+0x2b0/0x34c
> >  armv8pmu_disable_event+0x228/0x2f8
> >  armpmu_stop+0xa0/0x104
> >  perf_event_throttle_group+0x354/0x4cc
> >  __perf_event_account_interrupt+0x26c/0x290
> >  __perf_event_overflow+0xe8/0xd28
> >  perf_event_overflow+0x38/0x4c
> >  armv8pmu_handle_irq+0x244/0x320
> >  armpmu_dispatch_irq+0x6c/0x9c
> >  handle_percpu_devid_irq+0x174/0x308
> >  generic_handle_domain_irq+0xe0/0x140
> >  gic_handle_irq+0x6c/0x190
> >  call_on_irq_stack+0x24/0x30
> >  do_interrupt_handler+0xd4/0x138
> >  el1_interrupt+0x34/0x54
> >  el1h_64_irq_handler+0x18/0x24
> >  el1h_64_irq+0x6c/0x70
> 
> This indicates a regular IRQ rather than a pNMI.
> 
> >  generic_exec_single+0x2dc/0x304 (P)
> >  smp_call_function_single+0x308/0x530
> >  perf_install_in_context+0x4a0/0x798
> >  __arm64_sys_perf_event_open+0x184c/0x1d50
> >  invoke_syscall+0x98/0x2b8
> >  el0_svc_common+0x130/0x23c
> >  do_el0_svc+0x48/0x58
> >  el0_svc+0x58/0x17c
> >  el0t_64_sync_handler+0x78/0x108
> >  el0t_64_sync+0x198/0x19c
> > ---[ end trace ]---
> > ------------[ cut here ]------------
> > UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:730:26
> > shift exponent -1 is negative
> > CPU: 0 UID: 0 PID: 8006 Comm: syz.0.19 Not tainted 6.16.0-rc2-g5982a539cdce #3 PREEMPT
> > Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
> > Call trace:
> >  show_stack+0x2c/0x3c (C)
> >  __dump_stack+0x30/0x40
> >  dump_stack_lvl+0xd8/0x12c
> >  dump_stack+0x1c/0x28
> >  ubsan_epilogue+0x14/0x48
> >  __ubsan_handle_shift_out_of_bounds+0x2b0/0x34c
> >  armv8pmu_disable_event+0x298/0x2f8
> >  armpmu_stop+0xa0/0x104
> >  perf_event_throttle_group+0x354/0x4cc
> >  __perf_event_account_interrupt+0x26c/0x290
> >  __perf_event_overflow+0xe8/0xd28
> >  perf_event_overflow+0x38/0x4c
> >  armv8pmu_handle_irq+0x244/0x320
> >  armpmu_dispatch_irq+0x6c/0x9c
> >  handle_percpu_devid_irq+0x174/0x308
> >  generic_handle_domain_irq+0xe0/0x140
> >  gic_handle_irq+0x6c/0x190
> >  call_on_irq_stack+0x24/0x30
> >  do_interrupt_handler+0xd4/0x138
> >  el1_interrupt+0x34/0x54
> >  el1h_64_irq_handler+0x18/0x24
> >  el1h_64_irq+0x6c/0x70
> >  generic_exec_single+0x2dc/0x304 (P)
> >  smp_call_function_single+0x308/0x530
> >  perf_install_in_context+0x4a0/0x798
> >  __arm64_sys_perf_event_open+0x184c/0x1d50
> >  invoke_syscall+0x98/0x2b8
> >  el0_svc_common+0x130/0x23c
> >  do_el0_svc+0x48/0x58
> >  el0_svc+0x58/0x17c
> >  el0t_64_sync_handler+0x78/0x108
> >  el0t_64_sync+0x198/0x19c
> > ---[ end trace ]---
> > 
> > Fixes: 29227d6ea157 ("arm64: perf: Clean up enable/disable calls")
> 
> As above, I do not believe that this fixes tag is accurate.
> 
> > Signed-off-by: Yunseong Kim <ysk@...lloc.com>
> > Tested-by: Yunseong Kim <ysk@...lloc.com>
> > Cc: Yeoreum Yun <yeoreum.yun@....com>
> > Cc: syzkaller@...glegroups.com
> > ---
> >  drivers/perf/arm_pmuv3.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> > index 3db9f4ed17e8..846d69643fd8 100644
> > --- a/drivers/perf/arm_pmuv3.c
> > +++ b/drivers/perf/arm_pmuv3.c
> > @@ -795,6 +795,9 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
> >  
> >  static void armv8pmu_enable_event(struct perf_event *event)
> >  {
> > +	if (unlikely(event->hw.idx < 0))
> > +		return;
> > +
> >  	armv8pmu_write_event_type(event);
> >  	armv8pmu_enable_event_irq(event);
> >  	armv8pmu_enable_event_counter(event);
> > @@ -802,6 +805,9 @@ static void armv8pmu_enable_event(struct perf_event *event)
> >  
> >  static void armv8pmu_disable_event(struct perf_event *event)
> >  {
> > +	if (unlikely(event->hw.idx < 0))
> > +		return;
> > +
> >  	armv8pmu_disable_event_counter(event);
> >  	armv8pmu_disable_event_irq(event);
> 
> As above, this is not a correct fix, and NAK to silently ignoring an
> invalid idx.
> 
> Mark.
>