[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABXGCsO8_GXZQ9tJYZJDbO7oGvsHyVS-32L1PZ7YNL0SrA1RFg@mail.gmail.com>
Date: Mon, 6 Oct 2025 00:11:38 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: "Pillai, Aurabindo" <aurabindo.pillai@....com>, nicholas.kazlauskas@....com,
"Wu, Ray" <ray.wu@....com>, "Wheeler, Daniel" <daniel.wheeler@....com>, roman.li@....com,
"Chung, ChiaHsuan (Tom)" <chiahsuan.chung@....com>, "Deucher, Alexander" <alexander.deucher@....com>
Cc: amd-gfx list <amd-gfx@...ts.freedesktop.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: 6.18/regression/bisected – BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 after 6d31602a9f57
Hi,
After updating to a recent kernel, the system log becomes flooded with
the following stack trace repeating every second:
[ 17.380675] BUG: sleeping function called from invalid context at
./include/linux/sched/mm.h:321
[ 17.380676] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
0, name: swapper/27
[ 17.380677] preempt_count: 10003, expected: 0
[ 17.380678] RCU nest depth: 0, expected: 0
[ 17.380679] INFO: lockdep is turned off.
[ 17.380680] irq event stamp: 99468
[ 17.380680] hardirqs last enabled at (99467): [<ffffffff8739e346>]
cpuidle_enter_state+0xf6/0x4e0
[ 17.380682] hardirqs last disabled at (99468): [<ffffffff87398733>]
common_interrupt+0x13/0xe0
[ 17.380683] softirqs last enabled at (99454): [<ffffffff83c16659>]
handle_softirqs+0x579/0x840
[ 17.380685] softirqs last disabled at (99437): [<ffffffff83c16a56>]
__irq_exit_rcu+0x126/0x240
[ 17.380687] Preemption disabled at:
[ 17.380688] [<ffffffff83e262d3>] __raw_spin_lock_irq+0x23/0x90
[ 17.380690] CPU: 27 UID: 0 PID: 0 Comm: swapper/27 Tainted: G
W L ------ ---
6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug #1 PREEMPT(lazy)
[ 17.380692] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[ 17.380693] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.30 06/16/2025
[ 17.380693] Call Trace:
[ 17.380694] <IRQ>
[ 17.380695] dump_stack_lvl+0x84/0xd0
[ 17.380697] ? __raw_spin_lock_irq+0x23/0x90
[ 17.380698] __might_resched.cold+0x213/0x29d
[ 17.380700] ? rcu_is_watching+0x15/0xe0
[ 17.380702] ? __pfx___might_resched+0x10/0x10
[ 17.380704] ? rcu_is_watching+0x15/0xe0
[ 17.380706] __kmalloc_cache_noprof+0x480/0x8d0
[ 17.380707] ? ktime_get_raw+0x27/0x170
[ 17.380709] ? schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[ 17.381022] ? ktime_get_raw+0x5c/0x170
[ 17.381024] ? schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[ 17.381323] schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[ 17.381614] dm_crtc_high_irq+0x7bf/0xbd0 [amdgpu]
[ 17.381902] ? amdgpu_dm_irq_handler+0xf3/0x2a0 [amdgpu]
[ 17.382193] amdgpu_dm_irq_handler+0x19a/0x2a0 [amdgpu]
[ 17.382477] ? rcu_is_watching+0x15/0xe0
[ 17.382479] amdgpu_irq_dispatch+0x307/0x660 [amdgpu]
[ 17.382747] ? find_held_lock+0x2b/0x80
[ 17.382748] ? local_clock_noinstr+0xf/0x130
[ 17.382750] ? __pfx_amdgpu_irq_dispatch+0x10/0x10 [amdgpu]
[ 17.383004] ? __lock_release.isra.0+0x1cb/0x340
[ 17.383007] ? timekeeping_update_from_shadow+0x3df/0x7d0
[ 17.383009] ? timekeeping_adjust+0x47/0x740
[ 17.383011] amdgpu_ih_process+0x188/0x530 [amdgpu]
[ 17.383257] ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]
[ 17.383499] amdgpu_irq_handler+0x27/0xb0 [amdgpu]
[ 17.383740] ? __lock_release.isra.0+0x1cb/0x340
[ 17.383742] ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]
[ 17.383988] __handle_irq_event_percpu+0x1be/0x610
[ 17.383991] handle_irq_event+0xab/0x1c0
[ 17.383993] handle_edge_irq+0x2f1/0x890
[ 17.383995] __common_interrupt+0xac/0x1a0
[ 17.383997] common_interrupt+0xb0/0xe0
[ 17.383999] </IRQ>
[ 17.383999] <TASK>
[ 17.384000] asm_common_interrupt+0x26/0x40
[ 17.384001] RIP: 0010:cpuidle_enter_state+0xfc/0x4e0
[ 17.384003] Code: 73 04 bf ff ff ff ff 49 89 c7 e8 ef c6 3a ff 31
ff e8 68 5d 9f fc 45 84 f6 0f 85 9e 01 00 00 e8 fa 94 db fc fb 0f 1f
44 00 00 <45> 85 ed 0f 88 69 01 00 00 48 8b 3c 24 4d 63 e5 e8 7f b2 3a
ff 4c
[ 17.384004] RSP: 0018:ffffc9000044fd90 EFLAGS: 00000286
[ 17.384005] RAX: 000000000001848b RBX: ffff8881241f8000 RCX: ffffffff8739e346
[ 17.384006] RDX: ffff888109080000 RSI: ffffffff8848deda RDI: ffffffff878d4c20
[ 17.384007] RBP: ffffffff89a89520 R08: 0000000000000000 R09: 0000000000000001
[ 17.384007] R10: 000000000000001b R11: 0000000000000000 R12: 0000000000000003
[ 17.384008] R13: 0000000000000003 R14: 0000000000000000 R15: 000000040bb740a9
[ 17.384009] ? cpuidle_enter_state+0xf6/0x4e0
[ 17.384011] ? tick_nohz_next_event+0x14b/0x3a0
[ 17.384014] cpuidle_enter+0x4c/0xb0
[ 17.384015] cpuidle_idle_call+0x1b1/0x270
[ 17.384017] ? __pfx_cpuidle_idle_call+0x10/0x10
[ 17.384019] ? __pfx_tsc_verify_tsc_adjust+0x10/0x10
[ 17.384021] ? rcu_is_watching+0x15/0xe0
[ 17.384022] ? trace_irq_enable.constprop.0+0xc0/0x100
[ 17.384024] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
[ 17.384026] do_idle+0xee/0x190
[ 17.384028] cpu_startup_entry+0x53/0x70
[ 17.384030] start_secondary+0x21e/0x2c0
[ 17.384031] ? __pfx_start_secondary+0x10/0x10
[ 17.384033] common_startup_64+0x13e/0x141
[ 17.384036] </TASK>
[ 17.437100] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
The message appears roughly once per second.
git bisect points to the following commit as the first bad one:
commit 6d31602a9f57a7bb3c6c8dbde1d00af67e250a3f
Author: Aurabindo Pillai <aurabindo.pillai@....com>
Date: Wed Apr 16 11:26:54 2025 -0400
drm/amd/display: more liberal vmin/vmax update for freesync
[Why]
FAMS2 expects vmin/vmax to be updated in the case when freesync is
off, but supported. But we only update it when freesync is enabled.
[How]
Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
irrespective of whether freesync is enabled. If freesync is supported,
then there is no harm in updating vmin/vmax registers.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@....com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@....com>
Signed-off-by: Ray Wu <ray.wu@....com>
Tested-by: Daniel Wheeler <daniel.wheeler@....com>
Signed-off-by: Roman Li <roman.li@....com>
Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@....com>
Tested-by: Daniel Wheeler <daniel.wheeler@....com>
Signed-off-by: Alex Deucher <alexander.deucher@....com>
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 28
++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
Unfortunately, I couldn’t fully recheck the kernel without this commit
because reverting it leads to a merge conflict:
$ git revert -n 6d31602a9f57a7bb3c6c8dbde1d00af67e250a3f
Auto-merging drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
CONFLICT (content): Merge conflict in
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
error: could not revert 6d31602a9f57... drm/amd/display: more liberal
vmin/vmax update for freesync
System info:
Kernel: 6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug (PREEMPT lazy)
GPU: AMD Radeon RX 7900 XTX (Navi 31)
Board: ASRock B650I Lightning WiFi, BIOS 3.30 (2025-06-16)
Display(s): LG OLED42C3
Connection type: HDMI
Full hardware probe: https://linux-hardware.org/?probe=3fb21a7f94
The trace always points to schedule_dc_vmin_vmax() being called from
dm_crtc_high_irq(), which runs in IRQ context.
It looks like this path now performs an allocation or another
sleepable operation (__kmalloc_cache_noprof) inside an interrupt
handler, which causes the “sleeping function called from invalid
context” warning.
This started right after commit 6d31602a9f57 (“more liberal vmin/vmax
update for freesync”).
Before that, there were no such warnings.
Aurabindo, could you please take a look?
It seems that the vmin/vmax update path is now executed inside an
interrupt context and performs a sleeping allocation. Maybe it needs
to be deferred to a workqueue, or replaced with a GFP_ATOMIC
allocation if that’s safe.
I’ve attached:
- full kernel log (dmesg-6.18.0-0.rc0...+debug.zip)
- kernel build config (.config.zip)
--
Best Regards,
Mike Gavrilov.
Download attachment "dmesg-6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug.zip" of type "application/zip" (76481 bytes)
Download attachment ".config.zip" of type "application/zip" (69444 bytes)
Powered by blists - more mailing lists