[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <018007dd-68d9-4e16-b605-15d9c77ea13f@zhaoxin.com>
Date: Fri, 6 Feb 2026 16:13:30 +0800
From: LeoLiu-oc <LeoLiu-oc@...oxin.com>
To: Lukas Wunner <lukas@...ner.de>
CC: Bjorn Helgaas <helgaas@...nel.org>, <mahesh@...ux.ibm.com>,
<oohall@...il.com>, <bhelgaas@...gle.com>, <linuxppc-dev@...ts.ozlabs.org>,
<linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<CobeChen@...oxin.com>, <TonyWWang@...oxin.com>, <ErosZhang@...oxin.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>, Przemek Kitszel
<przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH] PCI: dpc: Increase pciehp waiting time for DPC recovery
在 2026/2/4 10:10, LeoLiu-oc 写道:
>
>
> 在 2026/2/2 17:02, Lukas Wunner 写道:
>>
>>
>> [这封邮件来自外部发件人 谨防风险]
>>
>> [cc += Tony, Przemek (ice driver maintainers), start of thread is here:
>> https://lore.kernel.org/all/20260123104034.429060-1-LeoLiu-oc@zhaoxin.com/
>> ]
>>
>> On Mon, Feb 02, 2026 at 02:00:55PM +0800, LeoLiu-oc wrote:
>>> The kernel version I am using is 6.18.6.
>> [...]
>>> The complete log of the kernel panic is as follows:
>>>
>>> [ 100.304077][ T843] list_del corruption, ffff8881418b79e8->next is LIST_POISON1 (dead000000000100)
>>> [ 100.312989][ T843] ------------[ cut here ]------------
>>> [ 100.318268][ T843] kernel BUG at lib/list_debug.c:56!
>>> [ 100.323380][ T843] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>> [ 100.329250][ T843] CPU: 7 PID: 843 Comm: irq/27-pciehp Tainted: P W OE ------- ---- 6.6.0-32.7.v2505.ky11.x86_64 #1
>>> [ 100.340793][ T843] Source Version: 71d5b964051132b7772acd935972fca11462bbfe
>>> [ 100.359228][ T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>>> [ 100.365877][ T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>>> [ 100.385158][ T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>>> [ 100.391024][ T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>>> [ 100.398781][ T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>>> [ 100.406538][ T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>>> [ 100.414294][ T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>>> [ 100.422050][ T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>>> [ 100.429807][ T843] FS: 0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>>> [ 100.438511][ T843] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 100.444891][ T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>>> [ 100.452647][ T843] PKRU: 55555554
>>> [ 100.456017][ T843] Call Trace:
>>> [ 100.459129][ T843] <TASK>
>>> [ 100.461898][ T843] ice_flow_rem_entry_sync.constprop.0+0x1c/0x90 [ice]
>>> [ 100.468663][ T843] ice_flow_rem_entry+0x3d/0x60 [ice]
>>> [ 100.473925][ T843] ice_fdir_erase_flow_from_hw.constprop.0+0x9b/0x100 [ice]
>>> [ 100.481078][ T843] ice_fdir_rem_flow.constprop.0+0x32/0xb0 [ice]
>>> [ 100.487284][ T843] ice_vsi_manage_fdir+0x7b/0xb0 [ice]
>>> [ 100.492629][ T843] ice_deinit_features.part.0+0x46/0xc0 [ice]
>>> [ 100.498571][ T843] ice_remove+0xcf/0x220 [ice]
>>> [ 100.503222][ T843] pci_device_remove+0x3f/0xb0
>>> [ 100.507798][ T843] device_release_driver_internal+0x19d/0x220
>>> [ 100.513667][ T843] pci_stop_bus_device+0x6c/0x90
>>> [ 100.518417][ T843] pci_stop_and_remove_bus_device+0x12/0x20
>>> [ 100.524110][ T843] pciehp_unconfigure_device+0x9f/0x160
>>> [ 100.529463][ T843] pciehp_disable_slot+0x69/0x130
>>> [ 100.534296][ T843] pciehp_handle_presence_or_link_change+0xfc/0x210
>>> [ 100.540678][ T843] pciehp_ist+0x204/0x230
>>> [ 100.544824][ T843] ? __pfx_irq_thread_fn+0x10/0x10
>>> [ 100.549747][ T843] irq_thread_fn+0x20/0x60
>>> [ 100.553978][ T843] irq_thread+0xfb/0x1c0
>>> [ 100.558038][ T843] ? __pfx_irq_thread_dtor+0x10/0x10
>>> [ 100.563130][ T843] ? __pfx_irq_thread+0x10/0x10
>>> [ 100.567791][ T843] kthread+0xe5/0x120
>>> [ 100.571594][ T843] ? __pfx_kthread+0x10/0x10
>>> [ 100.575997][ T843] ret_from_fork+0x17a/0x1a0
>>> [ 100.580403][ T843] ? __pfx_kthread+0x10/0x10
>>> [ 100.584805][ T843] ret_from_fork_asm+0x1a/0x30
>>> [ 100.589384][ T843] </TASK>
>>> [ 100.592237][ T843] Modules linked in: zxmem(OE) einj amdgpu amdxcp
>>> gpu_sched drm_exec drm_buddy nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
>>> nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
>>> nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 zhaoxin_cputemp
>>> nf_defrag_ipv4 zhaoxin_rng snd_hda_codec_hdmi radeon rfkill
>>> snd_hda_intel snd_intel_dspcfg irdma i2c_algo_bit snd_intel_sdw_acpi
>>> ip_set i40e drm_suballoc_helper nf_tables drm_ttm_helper pcicfg(POE)
>>> snd_hda_codec ib_uverbs sunrpc ttm ib_core snd_hda_core
>>> drm_display_helper snd_hwdep kvm_intel snd_pcm cec vfat fat
>>> drm_kms_helper snd_timer kvm video ice snd psmouse soundcore wmi
>>> acpi_cpufreq pcspkr i2c_zhaoxin sg sch_fq_codel drm fuse backlight
>>> nfnetlink xfs sd_mod t10_pi sm2_zhaoxin_gmi crct10dif_pclmul
>>> crc32_pclmul ahci crc32c_intel libahci r8169 ghash_clmulni_intel libata
>>> sha512_ssse3 serio_raw realtek dm_mirror dm_region_hash dm_log
>>> dm_multipath dm_mod i2c_dev autofs4
>>> [ 100.674508][ T843] ---[ end trace 0000000000000000 ]---
>>> [ 100.709547][ T843] RIP: 0010:__list_del_entry_valid_or_report+0x7f/0xc0
>>> [ 100.716197][ T843] Code: 66 4b a6 e8 c3 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 10 67 4b a6 e8 b2 43 a9 ff 0f 0b 48 89 fe 48 c7 c7 40 67 4b a6 e8 a1 43 a9 ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 78 67 4b a6 e8 8d 43 a9 ff 0f 0b
>>> [ 100.735491][ T843] RSP: 0018:ffffc9000f70fc08 EFLAGS: 00010246
>>> [ 100.741367][ T843] RAX: 000000000000004e RBX: ffff8881418b79e8 RCX: 0000000000000000
>>> [ 100.749137][ T843] RDX: 0000000000000000 RSI: ffff8897df5a32c0 RDI: ffff8897df5a32c0
>>> [ 100.756909][ T843] RBP: ffff8881257f9608 R08: 0000000000000000 R09: 0000000000000003
>>> [ 100.764678][ T843] R10: ffffc9000f70fa90 R11: ffffffffa6fee508 R12: 0000000000000000
>>> [ 100.772448][ T843] R13: ffff8881257f9608 R14: ffff888116507c28 R15: ffff888116507c28
>>> [ 100.780218][ T843] FS: 0000000000000000(0000) GS:ffff8897df580000(0000) knlGS:0000000000000000
>>> [ 100.788934][ T843] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 100.795329][ T843] CR2: 00007f9563bac1c0 CR3: 0000000c4be26004 CR4: 0000000000570ee0
>>> [ 100.803099][ T843] PKRU: 55555554
>>> [ 100.806483][ T843] Kernel panic - not syncing: Fatal exception
>>> [ 100.812794][ T843] Kernel Offset: disabled
>>> [ 100.821613][ T843] pstore: backend (erst) writing error (-28)
>>> [ 100.827481][ T843] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>
>>> The reason for this kernel panic is that the ice network card driver
>>> executed the ice_pci_err_detected() for a longer time than the maximum
>>> waiting time allowed by pciehp. After that, the pciehp_ist() will
>>> execute the ice network card driver's ice_remove() process. This results
>>> in the ice_pci_err_detected() having already deleted the list, while the
>>> ice_remove() is still attempting to delete a list that no longer exists.
>>
>> This is a bug in the ice driver, not in the pciehp or dpc driver.
>> As such, it is not a good argument to support the extension of the
>> timeout. I'm not against extending the timeout, but the argument
>> that it's necessary to avoid occurrence of a bug is not a good one.
>>
>> You should first try to unbind the ice driver at runtime to see if
>> there is a general problem in the unbind code path:
>>
>> echo abcd:ef:gh.i > /sys/bus/pci/drivers/shpchp/unbind
>>
>> Replace abcd:ef:gh.i with the domain/bus/device/function of the Ethernet
>> card. The dmesg excerpt you've provided unfortunately does not betray
>> the card's address.
>>
>> Then try to rebind the driver via the "bind" sysfs attribute.
>>
Sorry, I didn't mean to ignore your question,because these issues are
not the cause of the kernel panic. I have previously conducted a test
where I first unbound the ice network card and then bound it. There was
no problem with that.
>> If this works, the next thing to debug is whether the driver has a
>> problem with surprise removal. I'm not fully convinced that the
>> crash you're seeing is caused by concurrent execution of
>> ice_pci_err_detected() and ice_remove(). When pciehp unbinds the
>> driver during DPC recovery, the device is likely inaccessible.
>> It's possible that ice_remove() behaves differently for an
>> inaccessible device and that may cause the crash instead of the
>> concurrent execution of ice_pci_err_detected().
>>
I was able to turn off the power supply of the slot where the ice
network card is located and then enable the power supply through the
sysfs interface, without any issues.
For example,
echo 0 > /sys/bus/pci/slots/[slot number]/power
echo 1 > /sys/bus/pci/slots/[slot number]/power
It is also fine to perform DPC recovery separately for the slot where
the ice network card is located.
When the slot where the ice network card is located enables both DPC and
hotplug simultaneously, conducting the DPC recovery test will result in
issues, such as unavailability of the device and kernel panic.
I had previously confirmed through the code for deleting the list by
using the core dump method. The cause of the kernel panic was exactly as
I had described before: The reason for this kernel panic is that the ice
network card driver executed the ice_pci_err_detected() for a longer
time than the maximum waiting time allowed by pciehp. After that, the
pciehp_ist() will execute the ice network card driver's ice_remove()
process. This results in the ice_pci_err_detected() having already
deleted the list, while the ice_remove() is still attempting to delete a
list that no longer exists.
> The fundamental cause of this problem lies in the fact that the network
> driver took longer than the maximum time (4 seconds) set by pcie_ist()
> for the DPC to recover when executing ice_pci_err_detected(). This
> forced the execution of pciehp_disable_slot() which should not have been
> executed, while pcie_do_recovery() continued to execute. This situation
> led to a competition between the execution processes of
> pciehp_disable_slot() and pcie_do_recovery(), resulting in the
> unavailability of the device and the possibility of kernel crashes.
>
Add some information about this question, this might be a problem with a
series of PCIe devices, rather than just an issue with the ice network
card driver. Therefore, we should address the issue at the PCIe driver
architecture level to ensure that other PCIe devices do not encounter
such problems.
>> It would also be good to understand why DPC recovery of the Ethernet
>> card takes this long. Does it take a long time to come out of reset?
>> Could the ice driver be changed to allow for faster recovery?
>>
> Based on the current situation, it is observed that the execution of
> ice_pci_err_detected() in the ice network card driver takes a very long
> time, which is intolerable for the synchronization protocol between the
> PCIe hotplug driver and the DPC recovery.
>
Based on the previous debugging results, the long execution time of the
ice network card driver for ice_pci_err_detected() is mainly influenced
by the irdma driver of the ice network card.
> Yours sincerely,
> LeoLiu-oc
>
>> Thanks,
>>
>> Lukas
>
Powered by blists - more mailing lists