lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14fb716a-dedf-482a-8518-e5cc26165e97@kzalloc.com>
Date: Thu, 24 Jul 2025 03:57:42 +0900
From: Yunseong Kim <ysk@...lloc.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Will Deacon <will@...nel.org>, Austin Kim <austindh.kim@...il.com>,
 Michelle Jin <shjy180909@...il.com>, linux-arm-kernel@...ts.infradead.org,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 Yeoreum Yun <yeoreum.yun@....com>, syzkaller@...glegroups.com,
 Kan Liang <kan.liang@...ux.intel.com>, Peter Zijlstra
 <peterz@...radead.org>, Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH] perf: arm_pmuv3: Fix kernel panic on UBSAN from negative
 hw.idx in armv8pmu_enable/disable_event()

Hi Mark, thanks so much for taking the time to review this!

On 7/24/25 2:39 오전, Mark Rutland wrote:
> On Wed, Jul 23, 2025 at 01:35:31PM +0100, Mark Rutland wrote:
>> [ dropping Hemendra, since he doens't need to be spammed with ML traffic ]
>>
>> On Wed, Jul 23, 2025 at 10:44:03AM +0000, Yunseong Kim wrote:
>>> When 'event->hw.idx' was negative in armv8pmu_enable/disable_event().
>>>
>>>   UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:716:25
>>>   shift exponent -1 is negative
>>>
>>>   UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:658:13
>>>   shift exponent -1 is negative
>>>
>>> This occurred because a perf_event could reach armv8pmu event with a
>>> negative idx, typically when a valid counter could not be allocated.
>>
>> These functions are never supposed to be called for an event with a
>> negative idx. For that to happen there must either be an earlier bug (at
>> the time of pmu::add()) or there's a concurrency bug.
> 
> AFAICT this is a result of the group throttling logic introduced in
> commit:
> 
>   9734e25fbf5ae68e ("perf: Fix the throttle logic for a group")
> 
> That doesn't take into account that sibling events could have
> event->state <= PERF_EVENT_STATE_OFF, e.g. by virtue of
> perf_event_attr::disabled. For those events, event_sched_in() won't
> initialise the event, e.g. won't call event->pmu->add().
> 
> Thus when perf_event_throttle_group() and perf_event_unthrottle_group()
> iterate over events with:
> 
> 	for_each_sibling_event(sibling, leader)
> 		perf_event_[un]throttle(sibling, ...);
> 
> ... perf_event_[un]throttle() will call event->pmu->stop() and
> event->pmu->start() for those disabled events, resulting in the UBSAN
> splat above since they don't have a hw idx (which is assigned by
> event->pmu->add()).
> 
> I think the event's state needs to be taken into account somewhere
> during throttling. Given the sample of event_sched_in(), I'd assume that
> should be in the core code, rather than in each architecture's
> pmu::{start,stop}().

Appreciate your guidance on this.

I'll revise the relevant part of the code and run some tests again to see if the
issue still reproduces after the adjustment.

> I can reproduce this locally with:
> 
> | #include <stdio.h>
> | #include <stdlib.h>
> | #include <unistd.h>
> | 
> | #include <sys/syscall.h>
> | #include <sys/types.h>
> | 
> | #include <linux/perf_event.h>
> | 
> | static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
> |                            int group_fd, unsigned long flags)
> | {
> |         return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
> | }
> | 
> | struct perf_event_attr attr_parent = {
> |         .type = PERF_TYPE_HARDWARE,
> |         .size = sizeof(attr_parent),
> |         .config = PERF_COUNT_HW_CPU_CYCLES,
> |         .sample_period = 1,
> |         .exclude_kernel = 1,
> | };
> | 
> | struct perf_event_attr attr_child = {
> |         .type = PERF_TYPE_HARDWARE,
> |         .size = sizeof(attr_child),
> |         .config = PERF_COUNT_HW_CPU_CYCLES,
> |         .exclude_kernel = 1,
> |         .disabled = 1,
> | };
> | 
> | int main(int argc, char *argv[])
> | {
> |         int parent, child;
> | 
> |         parent = perf_event_open(&attr_parent, 0, -1, -1, 0);
> |         if (parent < 0) {
> |                 fprintf(stderr, "Unable to create event: %d\n", parent);
> |                 exit (-1);
> |         }
> | 
> |         child = perf_event_open(&attr_child, 0, -1, parent, 0);
> |         if (child < 0) {
> |                 fprintf(stderr, "Unable to create event: %d\n", child);
> |                 exit (-1);
> |         }
> | 
> |         for (;;) {
> |                 asm("" ::: "memory");
> |         }
> | 
> |         return 0;
> | }
> 
> I've kept the context below for anyone new to this thread, but hopefully
> the above is clear?
> 
> Mark.

I attempted to reproduce the issue with the code you shared, but it didn’t seem
to work on my end. Do I need to run it with multiple processes?

>>> This issue was observed when running KVM on Radxa's Orion6 platform.
>>
>> Do you mean that you see this on the host, or in a guest?

Yes, I observed this issue inside a KVM-enabled 'guest' running on this board.

>> Are you using pseudo-NMI?

$ grep CONFIG_ARM64_PSEUDO_NMI .config
# CONFIG_ARM64_PSEUDO_NMI is not set

I checked, and the feature is not included in the config I built. I've attached
the 'config-UBSAN-negative-idx' file as well.

>>> The issue was previously guarded indirectly by armv8pmu_event_is_chained(),
>>> which internally warned and returned false for idx < 0. But since the
>>> commit 29227d6ea157 ("arm64: perf: Clean up enable/disable calls"), this
>>> check was removed.
>>
>> The warning there was because this case *should not happen*, and
>> returning false was a way of minimizing the risk of a crash before the
>> warning was logged. 
>>
>> I don't think that armv8pmu_event_is_chained() would avoid the UBSAN
>> splat. Prior to commit 29227d6ea157 we had:
>>
>> | static inline void armv8pmu_enable_event_counter(struct perf_event *event)
>> | {
>> |         struct perf_event_attr *attr = &event->attr;
>> |         int idx = event->hw.idx;
>> | 
>> | 	...
>> | 
>> |         if (!kvm_pmu_counter_deferred(attr)) {
>> |                 armv8pmu_enable_counter(idx);
>> |                 if (armv8pmu_event_is_chained(event))
>> |                         armv8pmu_enable_counter(idx - 1);
>> |         }
>> | }
>>
>> Note the first call to armv8pmu_enable_counter(idx), so this wouldn't
>> help for an event with event->hw.idx==-1, and the only other way we
>> could get here is with a chained event with event->hw.idx==0, which is
>> not valid.
>>
>>> To prevent undefined behavior, add an explicit guard to early return from
>>> armv8pmu event if hw.idx < 0, 
>>
>> That is not a correct fix, and simply hides the real bug. It should not
>> be possible to reach this code when hw.idx < 0, and idx should be >= 0
>> whenever pmu::add() succeeds.

Now I understand.

>>> similar to handling in other PMU drivers.
>>> (e.g. intel_pmu_disable_event() on arch/x86/events/intel/core.c)
>>
>> I think what you're saying here is that intel_pmu_disable_event() will
>> pr_warn() and return early in this case. As above, that is because this
>> case is not expected to occur, and indicates a bug elsewhere.

Thanks for the explanation, Mark. That helps me understand why
*_pmu_disable_event wasn’t handled on other architectures — it was something I
was confused about even as I was working on the code.

>>> $ ./syz-execprog -executor=./syz-executor -repeat=0  -sandbox=none \
>>>   -disable=binfmt_misc,cgroups,close_fds,devlink_pci,ieee802154,net_dev,net_reset,nic_vf,swap,sysctl,tun,usb,vhci,wifi \
>>>   -procs=8 perf.syz
>>
>> This isn't all that helpful for reproducing the issue. Are the later
>> lines the contents of 'perf.syz'? My local build of syz-execprog can't
>> seem to parse this and prints help/usage.
>>
>> Has your syzkaller instance managed to generate a C reproducer that you
>> can share?
>> It should be possible to manually build a test from the above, but
>> that's rather tedious.

I’ve attached both the 'perf.syz' and the generated C reproducer file 'perf.c'
for the reference.

>>> ------------[ cut here ]------------
>>> UBSAN: shift-out-of-bounds in drivers/perf/arm_pmuv3.c:716:25
>>> shift exponent -1 is negative
>>> CPU: 0 UID: 0 PID: 8405 Comm: syz.3.19 Tainted: G        W           6.16.0-rc2-g5982a539cdce #3 PREEMPT
>>> Tainted: [W]=WARN
>>> Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
>>> Call trace:
>>>  show_stack+0x2c/0x3c (C)
>>>  __dump_stack+0x30/0x40
>>>  dump_stack_lvl+0xd8/0x12c
>>>  dump_stack+0x1c/0x28
>>>  ubsan_epilogue+0x14/0x48
>>>  __ubsan_handle_shift_out_of_bounds+0x2b0/0x34c
>>>  armv8pmu_enable_event+0x3c4/0x4b0
>>>  armpmu_start+0xc4/0x118
>>>  perf_event_unthrottle_group+0x3a8/0x50c
>>>  perf_adjust_freq_unthr_events+0x2f4/0x578
>>>  perf_adjust_freq_unthr_context+0x278/0x46c
>>>  perf_event_task_tick+0x394/0x5b0
>>
>> AFAICT perf_event_task_tick() is called with IRQs masked, and
>> perf_adjust_freq_unthr_context() disables the PMU for the duration of
>> the state manipulation, so this shouldn't be able to race with anything
>> that's using appropriate IRQ masking.
>>
>> This might be able to race with a pNMI though, and it looks like we're
>> not entirely robust.
>>
>>> Fixes: 29227d6ea157 ("arm64: perf: Clean up enable/disable calls")
>>
>> As above, I do not believe that this fixes tag is accurate.

Thanks again for your insights and checking this part.

>>> Signed-off-by: Yunseong Kim <ysk@...lloc.com>
>>> Tested-by: Yunseong Kim <ysk@...lloc.com>
>>> Cc: Yeoreum Yun <yeoreum.yun@....com>
>>> Cc: syzkaller@...glegroups.com
>>> ---
>>>  drivers/perf/arm_pmuv3.c | 6 ++++++
>>>  1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>>> index 3db9f4ed17e8..846d69643fd8 100644
>>> --- a/drivers/perf/arm_pmuv3.c
>>> +++ b/drivers/perf/arm_pmuv3.c
>>> @@ -795,6 +795,9 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
>>>  
>>>  static void armv8pmu_enable_event(struct perf_event *event)
>>>  {
>>> +	if (unlikely(event->hw.idx < 0))
>>> +		return;
>>> +
>>>  	armv8pmu_write_event_type(event);
>>>  	armv8pmu_enable_event_irq(event);
>>>  	armv8pmu_enable_event_counter(event);
>>> @@ -802,6 +805,9 @@ static void armv8pmu_enable_event(struct perf_event *event)
>>>  
>>>  static void armv8pmu_disable_event(struct perf_event *event)
>>>  {
>>> +	if (unlikely(event->hw.idx < 0))
>>> +		return;
>>> +
>>>  	armv8pmu_disable_event_counter(event);
>>>  	armv8pmu_disable_event_irq(event);
>>
>> As above, this is not a correct fix, and NAK to silently ignoring an
>> invalid idx.
>>
>> Mark.

Many thanks, Mark, for your insightful and careful review.

Best reagards,
Yunseong


View attachment "config-UBSAN-negative-idx" of type "text/plain" (343965 bytes)

View attachment "perf.c" of type "text/plain" (10374 bytes)

View attachment "perf.syz" of type "text/plain" (1052 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ