lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5427ede1-57a0-43d1-99f3-8ca4b0643e82@intel.com>
Date: Tue, 16 Dec 2025 22:20:56 +0530
From: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@...el.com>
To: <sathyanarayanan.kuppuswamy@...ux.intel.com>
CC: <rafael.j.wysocki@...el.com>, "intel-gfx@...ts.freedesktop.org"
	<intel-gfx@...ts.freedesktop.org>, "intel-xe@...ts.freedesktop.org"
	<intel-xe@...ts.freedesktop.org>, <linux-pm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <regressions@...mhuis.info>, "Kurmi, Suresh
 Kumar" <suresh.kumar.kurmi@...el.com>, "Saarinen, Jani"
	<jani.saarinen@...el.com>, <linux-kernel@...r.kernel.org>
Subject: REGRESSION on Linux 6.19-rc1

Hello Sathyanarayanan,

Hope you are doing well. I am Chaitanya from the linux graphics team in 
Intel.

This mail is regarding a regression we are seeing in our CI runs[1] on
drm-tip repository.

Since backmerge of Linux 6.19-rc1, we are seeing the following 
regression in the PTL machines.

`````````````````````````````````````````````````````````````````````````````````
<4>[    8.197433] ============================================
<4>[    8.197437] WARNING: possible recursive locking detected
<4>[    8.197440] 6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 Not 
tainted
<4>[    8.197444] --------------------------------------------
<4>[    8.197447] cpuhp/0/20 is trying to acquire lock:
<4>[    8.197450] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: 
rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197463]
                   but task is already holding lock:
<4>[    8.197466] ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, at: 
cpuhp_thread_fun+0x6d/0x290
<4>[    8.197477]
                   other info that might help us debug this:
<4>[    8.197480]  Possible unsafe locking scenario:

<4>[    8.197483]        CPU0
<4>[    8.197485]        ----
<4>[    8.197487]   lock(cpu_hotplug_lock);
<4>[    8.197490]   lock(cpu_hotplug_lock);
<4>[    8.197493]
                    *** DEADLOCK ***

<4>[    8.197496]  May be due to missing lock nesting notation

<4>[    8.197499] 2 locks held by cpuhp/0/20:
<4>[    8.197503]  #0: ffffffff83487870 (cpu_hotplug_lock){++++}-{0:0}, 
at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197513]  #1: ffffffff83489f60 (cpuhp_state-up){+.+.}-{0:0}, 
at: cpuhp_thread_fun+0x6d/0x290
<4>[    8.197523]
                   stack backtrace:
<4>[    8.197528] CPU: 0 UID: 0 PID: 20 Comm: cpuhp/0 Not tainted 
6.19.0-rc1-lgci-xe-xe-4242-05b7c58b3367dca84+ #1 PREEMPT(voluntary)
<4>[    8.197530] Hardware name: Intel Corporation Panther Lake Client 
Platform/PTL-UH LP5 T3 RVP1, BIOS PTLPFWI1.R00.3383.D10.2510222219 
10/22/2025
<4>[    8.197532] Call Trace:
<4>[    8.197532]  <TASK>
<4>[    8.197533]  dump_stack_lvl+0x91/0xf0
<4>[    8.197537]  dump_stack+0x10/0x20
<4>[    8.197538]  print_deadlock_bug+0x23f/0x320
<4>[    8.197542]  __lock_acquire+0x146e/0x2790
<4>[    8.197548]  lock_acquire+0xc4/0x2c0
<4>[    8.197550]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197556]  cpus_read_lock+0x41/0x110
<4>[    8.197558]  ? rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197561]  rapl_package_add_pmu+0x37/0x370 [intel_rapl_common]
<4>[    8.197565]  rapl_cpu_online+0x85/0x87 [intel_rapl_msr]
<4>[    8.197568]  ? __pfx_rapl_cpu_online+0x10/0x10 [intel_rapl_msr]
<4>[    8.197570]  cpuhp_invoke_callback+0x41f/0x6c0
<4>[    8.197573]  ? cpuhp_thread_fun+0x6d/0x290
<4>[    8.197575]  cpuhp_thread_fun+0x1e2/0x290
<4>[    8.197578]  ? smpboot_thread_fn+0x26/0x290
<4>[    8.197581]  smpboot_thread_fn+0x12f/0x290
<4>[    8.197584]  ? __pfx_smpboot_thread_fn+0x10/0x10
<4>[    8.197586]  kthread+0x11f/0x250
<4>[    8.197589]  ? __pfx_kthread+0x10/0x10
<4>[    8.197592]  ret_from_fork+0x344/0x3a0
<4>[    8.197595]  ? __pfx_kthread+0x10/0x10
<4>[    8.197597]  ret_from_fork_asm+0x1a/0x30
<4>[    8.197604]  </TASK>
`````````````````````````````````````````````````````````````````````````````````
Details log can be found in [2].

After bisecting the tree, the following patch [3] seems to be the first 
"bad" commit

`````````````````````````````````````````````````````````````````````````````````````````````````````````
commit 748d6ba43afde7e9ac27443233203995cc15d235
Author: Kuppuswamy Sathyanarayanan 
<sathyanarayanan.kuppuswamy@...ux.intel.com>
Date:   Thu Nov 20 16:05:39 2025 -0800

     powercap: intel_rapl: Enable MSR-based RAPL PMU support
`````````````````````````````````````````````````````````````````````````````````````````````````````````

We also verified that if we revert the patch the issue is not seen.

Could you please check why the patch causes this regression and provide 
a fix if necessary?

Thank you.

Regards

Chaitanya

[1]
https://intel-gfx-ci.01.org/tree/intel-xe/combined-alt.html?
[2]
https://intel-gfx-ci.01.org/tree/intel-xe/xe-4242-05b7c58b3367dca84d4745dfcac3b5d4ee142404/bat-ptl-2/boot0.txt
[3] 
https://cgit.freedesktop.org/drm-tip/commit/?id=748d6ba43afde7e9ac27443233203995cc15d235

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ