lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 2 Jan 2018 08:25:08 -0500
From:   Sinan Kaya <okaya@...eaurora.org>
To:     Keith Busch <keith.busch@...el.com>,
        Oza Pawandeep <poza@...eaurora.org>
Cc:     Bjorn Helgaas <bhelgaas@...gle.com>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Dongdong Liu <liudongdong3@...wei.com>,
        Gabriele Paoloni <gabriele.paoloni@...wei.com>,
        Wei Zhang <wzhang@...com>, Timur Tabi <timur@...eaurora.org>
Subject: Re: [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and
 DPC

Hi Keith,

On 12/29/2017 12:23 PM, Keith Busch wrote:
> On Fri, Dec 29, 2017 at 12:54:17PM +0530, Oza Pawandeep wrote:
>> This patch addresses the race condition between AER and DPC for recovery.
>>
>> Current DPC driver does not do recovery, e.g. calling end-point's driver's
>> callbacks, which sanitize the device.
>> DPC driver implements link_reset callback, and calls pci_do_recovery.
> 
> I'm not sure I see why any of this is necessary for two reasons:
> 
> 1. A downstream port containment event disables the link. How can a driver
> sanitize an end device when all the end devices below the containment are
> physically inaccessible? Any attempt to access such devices will just
> end with either CA or UR (depending on DPC control settings). Since we
> already know the failed outcome from attempting to access such devices,
> why do you want the drivers to do anything?

The reset callback to the endpoint driver has a status field indicating
whether the IO is frozen or not. If IO is not frozen, an endpoint driver
can potentially recover from the error by reissuing the failed request. 

If IO is frozen, then the endpoint driver needs to clean up outstanding
resources. It is not safe to just shutdown the driver while there are
transactions in flight. This is the reason for the status field and a
chance for driver to clean up any state machines and resources. 

Also note that the error callback has a result return value. An endpoint
driver indicates whether it was successful on recovering or not.


> 
> 2. A DPC event suppresses the error message required for the Linux
> AER driver to run. How can AER and DPC run concurrently?
> 

As we briefly discussed in previous email exchanges, I think you are
looking at a use case with a switch that supports DPC functionality. 

Oza and I are looking at a root port functionality with DPC feature. 

As you already know, AER errors are logged to AER capability register
independent of the DPC driver presence.

A root port is also allowed to share the MSI interrupts across DPC and
AER. 

Therefore, when a DPC interrupt fires; both AER driver and DPC driver
starts recovery work. This is the issue we are trying to deal with. 

In the end, the driver needs to work for both root port and switches.
I think you verified it against a switch. We are doing the same for a
root port and submitting the plumbing code. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ