[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aI_fUhpBrIBrJ073@debian.local>
Date: Sun, 3 Aug 2025 23:14:42 +0100
From: Chris Bainbridge <chris.bainbridge@...il.com>
To: linux-kernel@...r.kernel.org
Cc: surenb@...gle.com, bsegall@...gle.com, dietmar.eggemann@....com,
mingo@...hat.com, hannes@...xchg.org, juri.lelli@...hat.com,
mgorman@...e.de, peterz@...radead.org, rostedt@...dmis.org,
vschneid@...hat.com, vincent.guittot@...aro.org,
regressions@...ts.linux.dev
Subject: [REGRESSION] intermittent psi_avgs_work soft lockup
Hello,
I'm getting intermittent soft lockups with recent kernel builds. This is
a new error that I haven't seen before.
An example lockup from 6.16.0-08685-g260f6f4fda93:
[39389.154516] iwlwifi 0000:01:00.0: Queue 3 is stuck 4977 5129
[39400.400429] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:1751316]
[39400.400433] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat x_tables nf_tables br_netfilter bridge stp llc ccm overlay qrtr rfcomm cmac algif_hash algif_skcipher af_alg bnep binfmt_misc ext4 mbcache jbd2 nls_ascii nls_cp437 vfat fat snd_hda_codec_generic snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm snd_hda_intel snd_acp3x_pdm_dma snd_soc_dmic snd_acp3x_rn kvm_amd snd_hda_codec uvcvideo snd_soc_core mac80211 snd_usb_audio btusb snd_intel_dspcfg snd_compress videobuf2_vmalloc snd_usbmidi_lib btrtl libarc4 videobuf2_memops kvm snd_rawmidi snd_hwdep snd_pci_acp6x btintel uvc snd_seq_device snd_hda_core snd_pci_acp5x btbcm videobuf2_v4l2 irqbypass snd_pcm btmtk iwlwifi snd_rn_pci_acp3x sg videodev rapl snd_timer videobuf2_common wmi_bmof ee1004 snd_acp_config pcspkr bluetooth cfg80211 snd_soc_acpi k10temp snd mc snd_pci_acp3x soundcore ccp rfkill ac
[39400.400478] battery acpi_tad amd_pmc joydev evdev msr parport_pc ppdev lp parport efi_pstore fuse nvme_fabrics configfs nfnetlink efivarfs autofs4 crc32c_cryptoapi btrfs blake2b_generic xor raid6_pq hid_microsoft ff_memless hid_cmedia r8153_ecm cdc_ether usbnet r8152 mii libphy mdio_bus usbhid dm_crypt dm_mod sd_mod uas usb_storage scsi_mod scsi_common amdgpu drm_client_lib i2c_algo_bit drm_ttm_helper ttm drm_panel_backlight_quirks drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched hid_multitouch drm_display_helper ucsi_acpi hid_generic drm_kms_helper typec_ucsi sp5100_tco roles xhci_pci cec i2c_hid_acpi watchdog typec xhci_hcd amd_sfh i2c_hid rc_core nvme i2c_piix4 thunderbolt video usbcore ghash_clmulni_intel serio_raw hid crc16 nvme_core fan i2c_smbus usb_common button wmi drm aesni_intel
[39400.400514] irq event stamp: 28884
[39400.400515] hardirqs last enabled at (28883): [<ffffffffb6200dc6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
[39400.400521] hardirqs last disabled at (28884): [<ffffffffb71185fa>] sysvec_apic_timer_interrupt+0xa/0xc0
[39400.400526] softirqs last enabled at (28882): [<ffffffffb64f934d>] __irq_exit_rcu+0xcd/0x140
[39400.400530] softirqs last disabled at (28877): [<ffffffffb64f934d>] __irq_exit_rcu+0xcd/0x140
[39400.400533] CPU: 2 UID: 0 PID: 1751316 Comm: kworker/2:1 Not tainted 6.16.0-08685-g260f6f4fda93 #489 PREEMPT(voluntary)
[39400.400535] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
[39400.400537] Workqueue: events psi_avgs_work
[39400.400541] RIP: 0010:collect_percpu_times+0x2d5/0x440
[39400.400543] Code: 00 00 00 00 00 41 8b 0c 94 48 0f af c8 48 01 4c d5 00 48 83 c2 01 48 83 fa 06 75 e9 8d 53 01 e9 aa fd ff ff f3 90 48 8b 3c 24 <48> 8b 14 fd 20 d0 6d b7 48 01 c2 8b 12 f6 c2 01 0f 84 ab fe ff ff
[39400.400545] RSP: 0018:ffffc06b07823cf8 EFLAGS: 00000202
[39400.400546] RAX: ffffffffb82abc80 RBX: ffffe06aff48f440 RCX: 0000000000000006
[39400.400548] RDX: 00000000000014b7 RSI: ffffffffb76b7293 RDI: 000000000000000d
[39400.400548] RBP: ffffc06b07823d70 R08: 0000000000000001 R09: 0000000000000000
[39400.400549] R10: 0000000000000001 R11: 0000000000000003 R12: ffffc06b07823d50
[39400.400550] R13: ffffe06aff48f454 R14: 000000000000000d R15: ffffffffb82abc80
[39400.400551] FS: 0000000000000000(0000) GS:ffff9d9f4e072000(0000) knlGS:0000000000000000
[39400.400552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[39400.400553] CR2: 00000c2100382000 CR3: 0000000387c3b000 CR4: 0000000000750ef0
[39400.400554] PKRU: 55555554
[39400.400555] Call Trace:
[39400.400557] <TASK>
[39400.400571] psi_avgs_work+0x56/0xe0
[39400.400576] process_one_work+0x22b/0x5b0
[39400.400588] worker_thread+0x1d6/0x3c0
[39400.400592] ? bh_worker+0x260/0x260
[39400.400594] kthread+0x115/0x260
[39400.400599] ? kthreads_online_cpu+0x120/0x120
[39400.400603] ret_from_fork+0x231/0x2a0
[39400.400606] ? kthreads_online_cpu+0x120/0x120
[39400.400610] ret_from_fork_asm+0x11/0x20
[39400.400621] </TASK>
[39400.404429] watchdog: BUG: soft lockup - CPU#4 stuck for 21s! [kworker/4:0:1751752]
It appears to happen randomly when I have been away from the laptop for
some time and return, or sometimes if I leave it overnight. It also
looks like it occurs on 2% of system boots. Bisecting with such a low
failure probability takes a long time. I haven't identified the bad
commit yet, but I think I have narrowed it down to between v6.16-rc6
(good) and v6.16-rc6-79-g44e4e0297c3c (bad). At this rate, I should have
a more exact bisect result within a week.
#regzbot introduced: v6.16-rc6..v6.16-rc6-79-g44e4e0297c3c
Powered by blists - more mailing lists