lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b295cad-7302-cf2c-d19d-d27fabcb48be@kernel.org>
Date:   Mon, 28 Sep 2020 14:02:14 -0400
From:   Sinan Kaya <okaya@...nel.org>
To:     "Kuppuswamy, Sathyanarayanan" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Bjorn Helgaas <helgaas@...nel.org>
Cc:     bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, ashok.raj@...el.com,
        Jay Vosburgh <jay.vosburgh@...onical.com>
Subject: Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery()
 call

On 9/28/2020 1:15 PM, Kuppuswamy, Sathyanarayanan wrote:
> Since there is no state restoration for FATAL errors, I am wondering
> whether
> calls to ->error_detected(), ->mmio_enabled() and ->slot_reset() are
> required?

Good question,

Initially when we started, we were trying to handle both NON_FATAL and
FATAL errors in DPC.

We have seen value in unifying AER's callback mechanism with DPC.
It looks like this no longer applies for DPC.

Some drivers want these indication to stop outgoing DMA/timers so that
system can recover quickly.

There is value in calling them with existing AER based design.

I agree it doesn't apply here anymore if we are going to remove the
device driver. Maybe, you should stop calling pcie_do_recovery() in DPC
as well.

> 
> Let me know your comments about following pseudo code.
> 
> if (fatal error & hotplug_supported)
>    do nothing // if fatal triggered by DPC, clear DPC state.
> 
> if (fatal error & no-hotplug)
>   perform slot_reset and renumerate affected devices.

LGTM,

I apologize for calling this slot_reset but slot_reset in err.c code is
for post recovery callback to endpoint drivers. Let's not use this term
here anymore to not confuse ourselves.

remove device + rescan similar to how hotplug remove + hotplug insertion
notifications does eventually.

All of this to be done in DPC driver without any err.c involvement.

Bjorn,

What do you think? Is this a good direction?

Sinan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ