[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5427ede1-57a0-43d1-99f3-8ca4b0643e82@intel.com>
Date: Tue, 16 Dec 2025 22:20:56 +0530
From: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@...el.com>
To: <sathyanarayanan.kuppuswamy@...ux.intel.com>
CC: <rafael.j.wysocki@...el.com>, "intel-gfx@...ts.freedesktop.org"
<intel-gfx@...ts.freedesktop.org>, "intel-xe@...ts.freedesktop.org"
<intel-xe@...ts.freedesktop.org>, <linux-pm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <regressions@...mhuis.info>, "Kurmi, Suresh
Kumar" <suresh.kumar.kurmi@...el.com>, "Saarinen, Jani"
<jani.saarinen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: REGRESSION on Linux 6.19-rc1
Hello Sathyanarayanan,
Hope you are doing well. I am Chaitanya from the linux graphics team in
Intel.
This mail is regarding a regression we are seeing in our CI runs[1] on
drm-tip repository.
Since backmerge of Linux 6.19-rc1, we are seeing the following
regression in the PTL machines.
`````````````````````````````````````````````````````````````````````````````````
<4>[ 8.197433] ============================================
<4>[ 8.197437] WARNING: possible recursive locking detected
<4>[ 8.197440] 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 Not
tainted
<4>[ 8.197444] --------------------------------------------
<4>[ 8.197447] cpuhp/0/20 is trying to acquire lock:
<4>[ 8.197450] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at:
rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[ 8.197463]
but task is already holding lock:
<4>[ 8.197466] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at:
cpuhp_thread_fun+0x6d/0x290
<4>[ 8.197477]
other info that might help us debug this:
<4>[ 8.197480] Possible unsafe locking scenario:
<4>[ 8.197483] CPU0
<4>[ 8.197485] ----
<4>[ 8.197487] lock(cpu_hotplug_lock);
<4>[ 8.197490] lock(cpu_hotplug_lock);
<4>[ 8.197493]
*** DEADLOCK ***
<4>[ 8.197496] May be due to missing lock nesting notation
<4>[ 8.197499] 2 locks held by cpuhp/0/20:
<4>[ 8.197503] #0: ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0},
at: cpuhp_thread_fun+0x6d/0x290
<4>[ 8.197513] #1: ffffffff83489f60 (cpuhp_state-up){+.+.}-{0:0},
at: cpuhp_thread_fun+0x6d/0x290
<4>[ 8.197523]
stack backtrace:
<4>[ 8.197528] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted
6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 PREEMPT(voluntary)
<4>[ 8.197530] Hardware name: Intel Corporation Panther Lake Client
Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.3383.D10.2510222219
10/22/2025
<4>[ 8.197532] Call Trace:
<4>[ 8.197532] <TASK>
<4>[ 8.197533] dump_stack_lvl+0x91/0xf0
<4>[ 8.197537] dump_stack+0x10/0x20
<4>[ 8.197538] print_deadlock_bug+0x23f/0x320
<4>[ 8.197542] __lock_acquire+0x146e/0x2790
<4>[ 8.197548] lock_acquire+0xc4/0x2c0
<4>[ 8.197550] ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[ 8.197556] cpus_read_lock+0x41/0x110
<4>[ 8.197558] ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[ 8.197561] rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[ 8.197565] rapl_cpu_online+0x85/0x87 [intel_rapl_msr]
<4>[ 8.197568] ? __pfx_rapl_cpu_online+0x10/0x10 [intel_rapl_msr]
<4>[ 8.197570] cpuhp_invoke_callback+0x41f/0x6c0
<4>[ 8.197573] ? cpuhp_thread_fun+0x6d/0x290
<4>[ 8.197575] cpuhp_thread_fun+0x1e2/0x290
<4>[ 8.197578] ? smpboot_thread_fn+0x26/0x290
<4>[ 8.197581] smpboot_thread_fn+0x12f/0x290
<4>[ 8.197584] ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[ 8.197586] kthread+0x11f/0x250
<4>[ 8.197589] ? __pfx_kthread+0x10/0x10
<4>[ 8.197592] ret_from_fork+0x344/0x3a0
<4>[ 8.197595] ? __pfx_kthread+0x10/0x10
<4>[ 8.197597] ret_from_fork_asm+0x1a/0x30
<4>[ 8.197604] </TASK>
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].
After bisecting the tree, the following patch [3] seems to be the first
"bad" commit
`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 748d6ba43afde7e9ac27443233203995cc15d235
Author: Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>
Date: Thu Nov 20 16:05:39 2025 -0800
powercap: intel_rapl: Enable MSR-based RAPL PMU support
`````````````````````````````````````````````````````````````````````````````````````````````````````````
We also verified that if we revert the patch the issue is not seen.
Could you please check why the patch causes this regression and provide
a fix if necessary?
Thank you.
Regards
Chaitanya
[1]
https://intel-gfx-ci.01.org/tree/intel-xe/combined-alt.html?
[2]
https://intel-gfx-ci.01.org/tree/intel-xe/xe-4242-05b7c58b3367dca84d4745dfcac3b5d4ee142404/bat-ptl-2/boot0.txt
[3]
https://cgit.freedesktop.org/drm-tip/commit/?id=748d6ba43afde7e9ac27443233203995cc15d235
Powered by blists - more mailing lists