[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87frf3wx5s.ffs@tglx>
Date: Thu, 10 Jul 2025 23:31:11 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: himanshu.madhani@...cle.com, linux-kernel@...r.kernel.org
Cc: Himanshu Madhani <himanshu.madhani@...cle.com>, Jorge Lopez
<jorge.jo.lopez@...cle.com>, Bjorn Helgaas <helgaas@...nel.org>,
linux-pci@...r.kernel.org
Subject: Re: [PATCH] PCI/THP: Fix hang due to incorrect guard lock
On Tue, Jul 08 2025 at 22:25, himanshu madhani wrote:
> From: Himanshu Madhani <himanshu.madhani@...cle.com>
The subject line is misleading because the problem is not in the THP
code. It's in the PCI/MSI code implementation of the function which is
used by THP.
> Fix system hang due to incorrect mutex lock placement.
>
> following stack trace will be seen on system boot
Please trim down stack traces to the bare minimum which is required to
illustrate your point.
> [ 525.664681] task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00004002
> [ 525.796878] Call Trace:
> [ 525.826116] <TASK>
> [ 525.851195] __schedule+0x2d1/0x730
> [ 525.892917] schedule+0x27/0x80
> [ 525.930478] schedule_preempt_disabled+0x15/0x30
> [ 525.985718] __mutex_lock.constprop.0+0x4be/0x8a0
> [ 526.041993] msi_domain_get_virq+0xcc/0x110
> [ 526.092031] pci_msix_write_tph_tag+0x3c/0x100
> [ 526.145186] pcie_tph_set_st_entry+0x125/0x1d0
> [ 526.198346] bnxt_irq_affinity_release+0x35/0x50 [bnxt_en]
> [ 526.264015] irq_set_affinity_notifier+0xe0/0x130
> [ 526.320291] bnxt_free_irq+0x6e/0x110 [bnxt_en]
> [ 526.374507] __bnxt_close_nic.isra.0+0x1eb/0x220 [bnxt_en]
> [ 526.440175] bnxt_close+0x3a/0x100 [bnxt_en]
> [ 526.491264] __dev_close_many+0xae/0x220
> [ 526.538179] dev_close_many+0xc2/0x1b0
> [ 526.583014] netif_close+0x9d/0xd0
> [ 526.623693] bnxt_shutdown+0xb1/0xe0 [bnxt_en]
> [ 526.676874] pci_device_shutdown+0x35/0x70
> [ 526.725871] device_shutdown+0x118/0x1a0
You trimmed the interesting information that this is a softlockup and
kept all the gunk below whihc is completely useless.
> [ 526.772788] kernel_restart+0x3a/0x70
> [ 526.816588] __do_sys_reboot+0x150/0x250
> [ 526.863504] do_syscall_64+0x84/0x940
> [ 526.907300] ? __put_user_8+0xd/0x20
> [ 526.950059] ? rseq_ip_fixup+0x90/0x1e0
> [ 526.995937] ? task_mm_cid_work+0x1ad/0x220
> [ 527.045971] ? __rseq_handle_notify_resume+0x35/0x90
> [ 527.105367] ? arch_exit_to_user_mode_prepare.isra.0+0x98/0xb0
> [ 527.175166] ? do_syscall_64+0xba/0x940
> [ 527.221040] ? do_filp_open+0xd7/0x1a0
> [ 527.265882] ? alloc_fd+0xba/0x110
> [ 527.306556] ? do_sys_openat2+0xa4/0xf0
> [ 527.352434] ? __x64_sys_openat+0x54/0xb0
> [ 527.400389] ? arch_exit_to_user_mode_prepare.isra.0+0x9/0xb0
> [ 527.469150] ? do_syscall_64+0xba/0x940
> [ 527.515023] ? do_user_addr_fault+0x221/0x690
> [ 527.567141] ? clear_bhb_loop+0x30/0x80
> [ 527.613017] ? clear_bhb_loop+0x30/0x80
> [ 527.658895] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 527.719332] RIP: 0033:0x7fc3ec504777
> [ 527.762091] RSP: 002b:00007ffecd62c4f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a9
> [ 527.852685] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc3ec504777
> [ 527.938085] RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
> [ 528.023485] RBP: 00007ffecd62c700 R08: 0000000000000000 R09: 00007ffecd62b8e0
> [ 528.108878] R10: 0000000000000001 R11: 0000000000000202 R12: 00007ffecd62c568
> [ 528.194273] R13: 00007ffecd62c548 R14: 00007ffecd62c568 R15: 0000000000000000
> [ 528.279672] </TASK>
See https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces
> Fixes: d5124a9957b2 ("PCI/MSI: Provide a sane mechanism for TPH")
This fixes tag is correct
> Fixes: 71296eae5887 ("PCI/TPH: Replace the broken MSI-X control word update")
This one not because it's a subsequent problem caused by the above.
> Reported-by: Jorge Lopez <jorge.jo.lopez@...cle.com>
> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
> Tested-by: Jorge Lopez <jorge.jo.lopez@...cle.com>
> Signed-off-by: Himanshu Madhani <himanshu.madhani@...cle.com>
Other than that this looks good.
I pick it up tomorrow through the tip irq/urgent branch and fixup the
changelog, so no need to resend.
Thanks,
tglx
Powered by blists - more mailing lists