[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff282f45-9f17-5790-174c-e765aae0038c@maine.edu>
Date: Tue, 29 Jul 2025 12:50:34 -0400 (EDT)
From: Vince Weaver <vincent.weaver@...ne.edu>
To: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
cc: Vince Weaver <vincent.weaver@...ne.edu>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org, "Liang, Kan" <kan.liang@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>
Subject: Re: [perf] fuzzer triggers "BUG: kernel NULL pointer dereference"
On Tue, 29 Jul 2025, Mi, Dapeng wrote:
> Could you please provide more information about this issue? Like HW
> information, how long can the issue be produced and whether the issue can
> be seen in latest kernel (6.16)? Thanks.
I just reproduced this with current git (6.16.0+)
This is on a RaptorLake system.
I can reproduce this issue with the perf_fuzzer but it is possibly timing
sensitive and so if I enable fuzzer trace logging to try to make a
reproducible test case it won't trigger anymore.
The system locks up extremely hard and so I can't really get the panic
message besides taking a picture of the screen.
I can try enabling KASAN to see if that helps get better debug messages.
Vince
>
> --
>
> Dapeng Mi
>
> On 7/22/2025 5:17 AM, Vince Weaver wrote:
> > I'm still tracking this fuzzer issue. The fuzzer can reliably trigger the
> > crash but only 32000 syscalls deep into a run and I am having a lot of
> > trouble trying to gather a trace/testcase that can generate it.
> >
> > I was hoping the recent
> > [PATCH] perf/x86: Check if cpuc->events[*] pointer exists before accessing it
> > patch might fix things as the symptoms were vaguely similar but that
> > particular patch does not fix the problem.
> >
> > Vince
> >
> > On Tue, 8 Jul 2025, Vince Weaver wrote:
> >
> >> Hello
> >>
> >> the perf_fuzzer can reliably trigger this on a 6.16-rc2 kernel. It
> >> doesn't look obviously perf related but since the perf_fuzzer triggered it
> >> I thought I'd report it as a perf issue first. I can work on a smaller
> >> test case but that might take a bit especially as the machine locks up
> >> super hard and requires being unplugged after it's triggered.
> >>
> >> let me know if there's any other info I can provide. The dump below is
> >> transcribed from a screenshot as I still haven't figured out a way to get
> >> a serial console on this Raptorlake system.
> >>
> >> BUG: kernel NULL pointer dereference, address: 0000000000000008
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 0 P4D 0
> >> Oops: Oops: 0000 [#1] SMP NOPTI
> >> CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Not tainted 6.16.0-rc2+ #8 PREEMPT (voluntary)
> >> Hardware name: Dell Inc. Precision 3660/0VJ7G2
> >> RIP: 0010:rb_insert_color+0x18/0x130
> >> Code: 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07
> >> RSP: 0018:ffffb5e5c01e3df8 EFLAGS: 00010046
> >> RAX: ffff93f1927f8168 .....
> >> ...
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 000000000000008 CR3: 00000000596824001 CR4: 000000000000f72ef0
> >> DR0: 00000000a000001 ....
> >> PKRU: 55555554
> >> Call Trace:
> >> <TASK>
> >> timerqueue_add+0x66/0xb0
> >> hrtimer_start_range_ns+0x102/0x420
> >> ? next_zone+0x42/0x70
> >> tick_nohz_stop_tick+0xce/0x230
> >> tick_nohz_idle_stop_tick+0x70/0xd0
> >> do_idle+0x1d3/240
> >> cpu_startup_entry+0x29/0x30
> >> start_secondary+0x119/0x140
> >> common_startup_64+0x13e/0x141
> >> </TASK>
> >>
> >>
> >>
>
Powered by blists - more mailing lists