lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0dc80a2-6958-5cc1-b75e-2f1dd513f826@intel.com>
Date: Mon, 2 Oct 2023 22:50:27 -0700
From: Jesse Brandeburg <jesse.brandeburg@...el.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: <intel-wired-lan@...ts.osuosl.org>, <linux-pci@...r.kernel.org>,
	<pmenzel@...gen.mpg.de>, <netdev@...r.kernel.org>, <jkc@...hat.com>, "Vishal
 Agrawal" <vagrawal@...hat.com>, Przemek Kitszel
	<przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH iwl-net v2] ice: reset first in crash dump kernels

On 10/2/2023 4:49 PM, Jay Vosburgh wrote:
> Jesse Brandeburg <jesse.brandeburg@...el.com> wrote:
> 
>> When the system boots into the crash dump kernel after a panic, the ice
>> networking device may still have pending transactions that can cause errors
>> or machine checks when the device is re-enabled. This can prevent the crash
>> dump kernel from loading the driver or collecting the crash data.
>>
>> To avoid this issue, perform a function level reset (FLR) on the ice device
>> via PCIe config space before enabling it on the crash kernel. This will
>> clear any outstanding transactions and stop all queues and interrupts.
>> Restore the config space after the FLR, otherwise it was found in testing
>> that the driver wouldn't load successfully.
> 
> 	How does this differ from ading "reset_devices" to the crash
> kernel command line, per Documentation/admin-guide/kdump/kdump.rst?
> 
> 	-J
> 

Hi Jay, thanks for the question.

That parameter is new to me, and upon looking into the parameter, it
doesn't seem well documented. It also seems to only be used by storage
controllers, and would basically result in the same code I already have.
I suspect since it's a driver opt-in to the parameter, the difference
would be 1) requiring the user to give the reset_devices parameter on
the kdump kernel line (which is a big "if") and 2) less readable code
than the current which does:

if (is_kdump_kernel())
...

and the reset_devices way would be:

if (reset_devices)
...

There are several other examples in the networking tree using the method
I ended up with in this change. I'd argue the preferred way in the
networking tree is to use is_kdump_kernel(), which I like better because
it doesn't require user input and shouldn't have any bad side effects
from doing an extra reset in kdump.

Also, this issue has already been tested to be fixed by this patch.

I'd prefer to keep the patch as is, if that's ok with you.

Thanks,
Jesse




Powered by blists - more mailing lists