lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34f4a0d7-f834-499e-8747-936107510f99@oss.qualcomm.com>
Date: Sun, 3 Aug 2025 10:50:14 -0700
From: Jeff Johnson <jeff.johnson@....qualcomm.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>
Cc: linux-kernel@...r.kernel.org, Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
        Tejun Heo <tj@...nel.org>, Valentin Schneider <vschneid@...hat.com>,
        Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [GIT PULL] Scheduler updates for v6.17

On 8/2/25 11:43, Linus Torvalds wrote:
> On Wed, 30 Jul 2025 at 20:31, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> On Sun, 27 Jul 2025 at 23:48, Ingo Molnar <mingo@...nel.org> wrote:
>>>
>>> PSI:
>>>
>>>  - Improve scalability by optimizing psi_group_change() cpu_clock() usage
>>>    (Peter Zijlstra)
>>
>> I suspect this is buggy.
>>
>> Maybe this is coincidence, but that sounds very unlikely:
>>
>>   watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:3:7996]
>>   CPU#0 Utilization every 4s during lockup:
> 
> Happened again this morning, and as far as I can tell the machine was
> just sitting there idle at the desktop.
> 
> I've only seen this on my laptop, so maybe it's some hw dependency,
> but it *really* smells like commit 570c8efd5eb7 ("sched/psi: Optimize
> psi_group_change() cpu_clock() usage") from the symptoms. It's
> literally hanging on that psi_read_begin(), which is that
> read_seqcount_begin() on that new per-cpu psi_seq counter.
> 
> Now, I'm not seeing how it could possibly trigger - I looked through
> all the psi_write_begin() users, and they all *seem* to be (a) under
> rq_lock_irq and (b) paired with a psi_write_end() with the same cpu.
> 
> But the symptoms have been very consistent both times it happened: the
> RIP always a watchdog in collect_percpu_times(), always at that
> 'pause' in the "wait for seqcount to be even".
> 
> It's typically been in that psi_avgs_work kworker, but once it was
> systemd-oomd that apparently had done a "read()" on it, so it went
> through "psi_show()" instead.
> 
> Now, the *writers* all take the proper locks, but the readers don't.
> And my laptop has CONFIG_PREMPT_VOLUNTARY in its config (random old
> setting).
> 
> I'm not seeing why that would matter, since the seq count should
> become even at some point, but it does mean that the seqcount read
> loop looks like it's an endless kernel loop when it triggers. I don't
> see how that would make a difference, since the seqcount should become
> even on the writer side and the writers shouldn't be preempted and get
> some kind of priority inversion with a reader that doesn't go away,
> but *if* there is some bug in this area, maybe that config is why I'm
> seeing it and others aren't?
> 
> Any ideas, people?

FWIW I'm seeing the same thing.

Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [kworker/3:0:3977]
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: Modules linked in: snd_seq_dummy snd_hrtimer ccm michael_mic bnep amdgpu snd_hda_codec_hdmi amdxcp gpu_sched drm_panel_backlight_quirks rmi_smbus rmi_core qrtr_mhi snd_hda_codec_generic at24 intel_rapl_msr binfmt_misc snd_hda_intel snd_hda_codec intel_rapl_common mei_hdcp snd_hda_core x86_pkg_temp_thermal qrtr snd_intel_dspcfg snd_intel_sdw_acpi intel_powerclamp snd_hwdep snd_pcm uvcvideo ath12k coretemp videobuf2_vmalloc qmi_helpers ghash_clmulni_intel nls_iso8859_1 aesni_intel uvc rapl snd_seq_midi videobuf2_memops mac80211 wmi_bmof snd_seq_midi_event libarc4 intel_cstate i2c_i801 videobuf2_v4l2 i915 i2c_mux radeon snd_rawmidi videobuf2_common drm_ttm_helper drm_buddy cfg80211 drm_exec i2c_smbus videodev ttm btusb drm_suballoc_helper snd_seq mc drm_client_lib btrtl btintel drm_display_helper btbcm mhi snd_seq_device btmtk cec snd_timer rc_core drm_kms_helper bluetooth mei_me snd lpc_ich mei i2c_algo_bit soundcore wireless_hotkey tpm_infineon input_leds joydev mac_hid serio_raw msr parport_pc ppdev lp
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  parport efi_pstore drm nfnetlink dmi_sysfs autofs4 rtsx_pci_sdmmc video cdc_ether usbnet mii psmouse ahci rtsx_pci libahci e1000e wmi
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: irq event stamp: 198926
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: hardirqs last  enabled at (198925): [<ffffffffa240150a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: hardirqs last disabled at (198926): [<ffffffffa5714d90>] sysvec_apic_timer_interrupt+0x10/0xb0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: softirqs last  enabled at (198904): [<ffffffffa29a4ff3>] __irq_exit_rcu+0xb3/0xe0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: softirqs last disabled at (198899): [<ffffffffa29a4ff3>] __irq_exit_rcu+0xb3/0xe0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: CPU: 3 UID: 0 PID: 3977 Comm: kworker/3:0 Not tainted 6.16.0+ #146 PREEMPT(voluntary) 
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: Hardware name: Hewlett-Packard HP ZBook 14 G2/2216, BIOS M71 Ver. 01.31 02/24/2020
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: Workqueue: events psi_avgs_work
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: RIP: 0010:collect_percpu_times+0x77a/0xe80
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: Code: 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 54 24 68 49 c7 c1 00 b0 51 a8 49 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 4c 01 c2 f3 90 <49> 81 ff 00 20 00 00 0f 83 93 04 00 00 80 3a 00 0f 85 38 06 00 00
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: RSP: 0018:ffff888132d3f9f0 EFLAGS: 00000202
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: RAX: 0000000000000003 RBX: ffffe8ffffdf65c0 RCX: 0000000000000000
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: RDX: fffffbfff4c7018b RSI: 0000000000000000 RDI: ffffffffa2b19025
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: RBP: fffffbfff4c7018b R08: dffffc0000000000 R09: ffffffffa851b000
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffa851b000
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: R13: 000000000000085b R14: dffffc0000000000 R15: 0000000000000003
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: FS:  0000000000000000(0000) GS:ffff888467291000(0000) knlGS:0000000000000000
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: CR2: 00007f3350001158 CR3: 00000001040bc001 CR4: 00000000003706f0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel: Call Trace:
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  <TASK>
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? do_raw_spin_lock+0x12d/0x270
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_collect_percpu_times+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx___mutex_lock+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? _raw_spin_unlock_irqrestore+0x27/0x60
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  psi_avgs_work+0x96/0x200
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? lock_acquire+0x154/0x2d0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_psi_avgs_work+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? lock_release+0xc6/0x2a0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  process_one_work+0x86e/0x14b0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_process_one_work+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? assign_work+0x16c/0x240
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  worker_thread+0x5d0/0xfc0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_worker_thread+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  kthread+0x375/0x750
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_kthread+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? ret_from_fork+0x1f/0x2f0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? lock_release+0xc6/0x2a0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_kthread+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ret_from_fork+0x215/0x2f0
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ? __pfx_kthread+0x10/0x10
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  ret_from_fork_asm+0x1a/0x30
Aug 03 10:17:26 qca-HP-ZBook-14-G2 kernel:  </TASK>

just a bit before, if it matters (this sequence occurred 3 times)...
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: The canary thread is apparently starving. Taking action.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Demoting known real-time threads.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1861 of process 1789.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1556 of process 1506.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1567 of process 1505.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1505 of process 1505.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1568 of process 1510.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1510 of process 1510.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1559 of process 1509.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Successfully demoted thread 1509 of process 1509.
Aug 03 10:14:02 qca-HP-ZBook-14-G2 rtkit-daemon[1557]: Demoted 8 threads.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ