[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230723103210.4b1b032e@rorschach.local.home>
Date: Sun, 23 Jul 2023 10:32:10 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: <kkabe@...a.pgw.jp>
Cc: regressions@...ts.linux.dev, bagasdotme@...il.com,
alexander.deucher@....com, christian.koenig@....com,
Xinhui.Pan@....com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
linux-kernel@...r.kernel.org, amd-gfx@...ts.freedesktop.org
Subject: Re: radeon.ko/i586: BUG: kernel NULL pointer
dereference,address:00000004
On Sun, 23 Jul 2023 20:55:06 +0900
<kkabe@...a.pgw.jp> wrote:
> So I tried to trap NULL and return:
>
> ================ patch-drm_vblank_cancel_pending_works-printk-NULL-ret.patch
> diff -up ./drivers/gpu/drm/drm_vblank_work.c.pk2 ./drivers/gpu/drm/drm_vblank_work.c
> --- ./drivers/gpu/drm/drm_vblank_work.c.pk2 2023-06-06 20:50:40.000000000 +0900
> +++ ./drivers/gpu/drm/drm_vblank_work.c 2023-07-23 14:29:56.383093673 +0900
> @@ -71,6 +71,10 @@ void drm_vblank_cancel_pending_works(str
> {
> struct drm_vblank_work *work, *next;
>
> + if (!vblank->dev) {
> + printk(KERN_WARNING "%s: vblank->dev == NULL? returning\n", __func__);
> + return;
> + }
> assert_spin_locked(&vblank->dev->event_lock);
>
> list_for_each_entry_safe(work, next, &vblank->pending_work, node) {
> ================
>
> This time, the printk trap does not happen!! and radeon.ko works.
> (NULL check for vblank->worker is still fireing though)
>
> Now this is puzzling.
> Is this a timing issue?
It could very well be. And the ftrace patch could possibly not be the
cause at all. But the thread that is created to do the work is causing
the race window to be opened up, which is why you see it with the patch
and don't without it. It may not be the problem, it may just tickle the
timings enough to trigger the bug, and is causing you to go on a wild
goose chase in the wrong direction.
-- Steve
> Is systemd-udevd doing something not favaorble to kernel?
> Is drm vblank code running without enough initialization?
>
> Puzzling is, that purely useland activity
> (logging in on tty1 before radeon.ko load)
> is affecting kernel panic/no-panic.
Powered by blists - more mailing lists