lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABXGCsO8_GXZQ9tJYZJDbO7oGvsHyVS-32L1PZ7YNL0SrA1RFg@mail.gmail.com>
Date: Mon, 6 Oct 2025 00:11:38 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: "Pillai, Aurabindo" <aurabindo.pillai@....com>, nicholas.kazlauskas@....com, 
	"Wu, Ray" <ray.wu@....com>, "Wheeler, Daniel" <daniel.wheeler@....com>, roman.li@....com, 
	"Chung, ChiaHsuan (Tom)" <chiahsuan.chung@....com>, "Deucher, Alexander" <alexander.deucher@....com>
Cc: amd-gfx list <amd-gfx@...ts.freedesktop.org>, 
	Linux List Kernel Mailing <linux-kernel@...r.kernel.org>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: 6.18/regression/bisected – BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 after 6d31602a9f57

Hi,

After updating to a recent kernel, the system log becomes flooded with
the following stack trace repeating every second:
[   17.380675] BUG: sleeping function called from invalid context at
./include/linux/sched/mm.h:321
[   17.380676] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid:
0, name: swapper/27
[   17.380677] preempt_count: 10003, expected: 0
[   17.380678] RCU nest depth: 0, expected: 0
[   17.380679] INFO: lockdep is turned off.
[   17.380680] irq event stamp: 99468
[   17.380680] hardirqs last  enabled at (99467): [<ffffffff8739e346>]
cpuidle_enter_state+0xf6/0x4e0
[   17.380682] hardirqs last disabled at (99468): [<ffffffff87398733>]
common_interrupt+0x13/0xe0
[   17.380683] softirqs last  enabled at (99454): [<ffffffff83c16659>]
handle_softirqs+0x579/0x840
[   17.380685] softirqs last disabled at (99437): [<ffffffff83c16a56>]
__irq_exit_rcu+0x126/0x240
[   17.380687] Preemption disabled at:
[   17.380688] [<ffffffff83e262d3>] __raw_spin_lock_irq+0x23/0x90
[   17.380690] CPU: 27 UID: 0 PID: 0 Comm: swapper/27 Tainted: G
 W    L     ------  ---
6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug #1 PREEMPT(lazy)
[   17.380692] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[   17.380693] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.30 06/16/2025
[   17.380693] Call Trace:
[   17.380694]  <IRQ>
[   17.380695]  dump_stack_lvl+0x84/0xd0
[   17.380697]  ? __raw_spin_lock_irq+0x23/0x90
[   17.380698]  __might_resched.cold+0x213/0x29d
[   17.380700]  ? rcu_is_watching+0x15/0xe0
[   17.380702]  ? __pfx___might_resched+0x10/0x10
[   17.380704]  ? rcu_is_watching+0x15/0xe0
[   17.380706]  __kmalloc_cache_noprof+0x480/0x8d0
[   17.380707]  ? ktime_get_raw+0x27/0x170
[   17.380709]  ? schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[   17.381022]  ? ktime_get_raw+0x5c/0x170
[   17.381024]  ? schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[   17.381323]  schedule_dc_vmin_vmax+0x9a/0x3e0 [amdgpu]
[   17.381614]  dm_crtc_high_irq+0x7bf/0xbd0 [amdgpu]
[   17.381902]  ? amdgpu_dm_irq_handler+0xf3/0x2a0 [amdgpu]
[   17.382193]  amdgpu_dm_irq_handler+0x19a/0x2a0 [amdgpu]
[   17.382477]  ? rcu_is_watching+0x15/0xe0
[   17.382479]  amdgpu_irq_dispatch+0x307/0x660 [amdgpu]
[   17.382747]  ? find_held_lock+0x2b/0x80
[   17.382748]  ? local_clock_noinstr+0xf/0x130
[   17.382750]  ? __pfx_amdgpu_irq_dispatch+0x10/0x10 [amdgpu]
[   17.383004]  ? __lock_release.isra.0+0x1cb/0x340
[   17.383007]  ? timekeeping_update_from_shadow+0x3df/0x7d0
[   17.383009]  ? timekeeping_adjust+0x47/0x740
[   17.383011]  amdgpu_ih_process+0x188/0x530 [amdgpu]
[   17.383257]  ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]
[   17.383499]  amdgpu_irq_handler+0x27/0xb0 [amdgpu]
[   17.383740]  ? __lock_release.isra.0+0x1cb/0x340
[   17.383742]  ? __pfx_amdgpu_irq_handler+0x10/0x10 [amdgpu]
[   17.383988]  __handle_irq_event_percpu+0x1be/0x610
[   17.383991]  handle_irq_event+0xab/0x1c0
[   17.383993]  handle_edge_irq+0x2f1/0x890
[   17.383995]  __common_interrupt+0xac/0x1a0
[   17.383997]  common_interrupt+0xb0/0xe0
[   17.383999]  </IRQ>
[   17.383999]  <TASK>
[   17.384000]  asm_common_interrupt+0x26/0x40
[   17.384001] RIP: 0010:cpuidle_enter_state+0xfc/0x4e0
[   17.384003] Code: 73 04 bf ff ff ff ff 49 89 c7 e8 ef c6 3a ff 31
ff e8 68 5d 9f fc 45 84 f6 0f 85 9e 01 00 00 e8 fa 94 db fc fb 0f 1f
44 00 00 <45> 85 ed 0f 88 69 01 00 00 48 8b 3c 24 4d 63 e5 e8 7f b2 3a
ff 4c
[   17.384004] RSP: 0018:ffffc9000044fd90 EFLAGS: 00000286
[   17.384005] RAX: 000000000001848b RBX: ffff8881241f8000 RCX: ffffffff8739e346
[   17.384006] RDX: ffff888109080000 RSI: ffffffff8848deda RDI: ffffffff878d4c20
[   17.384007] RBP: ffffffff89a89520 R08: 0000000000000000 R09: 0000000000000001
[   17.384007] R10: 000000000000001b R11: 0000000000000000 R12: 0000000000000003
[   17.384008] R13: 0000000000000003 R14: 0000000000000000 R15: 000000040bb740a9
[   17.384009]  ? cpuidle_enter_state+0xf6/0x4e0
[   17.384011]  ? tick_nohz_next_event+0x14b/0x3a0
[   17.384014]  cpuidle_enter+0x4c/0xb0
[   17.384015]  cpuidle_idle_call+0x1b1/0x270
[   17.384017]  ? __pfx_cpuidle_idle_call+0x10/0x10
[   17.384019]  ? __pfx_tsc_verify_tsc_adjust+0x10/0x10
[   17.384021]  ? rcu_is_watching+0x15/0xe0
[   17.384022]  ? trace_irq_enable.constprop.0+0xc0/0x100
[   17.384024]  ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
[   17.384026]  do_idle+0xee/0x190
[   17.384028]  cpu_startup_entry+0x53/0x70
[   17.384030]  start_secondary+0x21e/0x2c0
[   17.384031]  ? __pfx_start_secondary+0x10/0x10
[   17.384033]  common_startup_64+0x13e/0x141
[   17.384036]  </TASK>
[   17.437100] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device

The message appears roughly once per second.

git bisect points to the following commit as the first bad one:
commit 6d31602a9f57a7bb3c6c8dbde1d00af67e250a3f
Author: Aurabindo Pillai <aurabindo.pillai@....com>
Date:   Wed Apr 16 11:26:54 2025 -0400

    drm/amd/display: more liberal vmin/vmax update for freesync

    [Why]
    FAMS2 expects vmin/vmax to be updated in the case when freesync is
    off, but supported. But we only update it when freesync is enabled.

    [How]
    Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
    irrespective of whether freesync is enabled. If freesync is supported,
    then there is no harm in updating vmin/vmax registers.

    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
    Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@....com>
    Signed-off-by: Aurabindo Pillai <aurabindo.pillai@....com>
    Signed-off-by: Ray Wu <ray.wu@....com>
    Tested-by: Daniel Wheeler <daniel.wheeler@....com>
    Signed-off-by: Roman Li <roman.li@....com>
    Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@....com>
    Tested-by: Daniel Wheeler <daniel.wheeler@....com>
    Signed-off-by: Alex Deucher <alexander.deucher@....com>

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 28
++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

Unfortunately, I couldn’t fully recheck the kernel without this commit
because reverting it leads to a merge conflict:
$ git revert -n 6d31602a9f57a7bb3c6c8dbde1d00af67e250a3f
Auto-merging drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
CONFLICT (content): Merge conflict in
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
error: could not revert 6d31602a9f57... drm/amd/display: more liberal
vmin/vmax update for freesync

System info:
Kernel: 6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug (PREEMPT lazy)
GPU: AMD Radeon RX 7900 XTX (Navi 31)
Board: ASRock B650I Lightning WiFi, BIOS 3.30 (2025-06-16)
Display(s): LG OLED42C3
Connection type: HDMI
Full hardware probe: https://linux-hardware.org/?probe=3fb21a7f94

The trace always points to schedule_dc_vmin_vmax() being called from
dm_crtc_high_irq(), which runs in IRQ context.
It looks like this path now performs an allocation or another
sleepable operation (__kmalloc_cache_noprof) inside an interrupt
handler, which causes the “sleeping function called from invalid
context” warning.

This started right after commit 6d31602a9f57 (“more liberal vmin/vmax
update for freesync”).
Before that, there were no such warnings.

Aurabindo, could you please take a look?
It seems that the vmin/vmax update path is now executed inside an
interrupt context and performs a sleeping allocation. Maybe it needs
to be deferred to a workqueue, or replaced with a GFP_ATOMIC
allocation if that’s safe.

I’ve attached:
- full kernel log (dmesg-6.18.0-0.rc0...+debug.zip)
- kernel build config (.config.zip)

-- 
Best Regards,
Mike Gavrilov.

Download attachment "dmesg-6.18.0-0.rc0.251003ge406d57be7bd.6.fc44.x86_64+debug.zip" of type "application/zip" (76481 bytes)

Download attachment ".config.zip" of type "application/zip" (69444 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ