lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <403949d7-7c36-4b9a-a079-60a5aa985dd1@163.com>
Date: Fri, 23 May 2025 00:01:29 +0800
From: Hans Zhang <18255117159@....com>
To: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc: bhelgaas@...gle.com, tglx@...utronix.de, kw@...ux.com,
 mahesh@...ux.ibm.com, oohall@...il.com, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH 0/4] pci: implement "pci=aer_panic"



On 2025/5/22 19:47, Manivannan Sadhasivam wrote:
> On Sat, May 17, 2025 at 12:55:14AM +0800, Hans Zhang wrote:
>> The following series introduces a new kernel command-line option aer_panic
>> to enhance error handling for PCIe Advanced Error Reporting (AER) in
>> mission-critical environments. This feature ensures deterministic recover
>> from fatal PCIe errors by triggering a controlled kernel panic when device
>> recovery fails, avoiding indefinite system hangs.
>>
>> Problem Statement
>> In systems where unresolved PCIe errors (e.g., bus hangs) occur,
>> traditional error recovery mechanisms may leave the system unresponsive
>> indefinitely. This is unacceptable for high-availability environment
>> requiring prompt recovery via reboot.
>>
>> Solution
>> The aer_panic option forces a kernel panic on unrecoverable AER errors.
>> This bypasses prolonged recovery attempts and ensures immediate reboot.
>>
> 
> You should not panic the kernel when a PCI error occurs (even if it is a fatal
> one). You should instead try to reset the root complex. For that you need this
> series that got merged recently:
> https://lore.kernel.org/all/20250508-pcie-reset-slot-v4-0-7050093e2b50@linaro.org
> 
> PS: You need to populate the slot_reset callback in your controller driver to
> reset the controller in the event of a fatal AER error or link down.

Dear Mani,

Thank you for your reply. I will take a look at the submission record 
you provided.

Best regards,
Hans


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ