lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210328095332.GA8657@wunner.de>
Date:   Sun, 28 Mar 2021 11:53:32 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     "Kuppuswamy, Sathyanarayanan" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc:     Sathyanarayanan Kuppuswamy Natarajan 
        <sathyanarayanan.nkuppuswamy@...il.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Linux PCI <linux-pci@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        Keith Busch <kbusch@...nel.org>, knsathya@...nel.org,
        Sinan Kaya <okaya@...nel.org>
Subject: Re: [PATCH v2 1/1] PCI: pciehp: Skip DLLSC handling if DPC is
 triggered

On Wed, Mar 17, 2021 at 01:02:07PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> On 3/17/21 12:01 PM, Lukas Wunner wrote:
> > If the events are ignored, the driver of the device in the hotplug slot
> > is not unbound and rebound.  So the driver must be able to cope with
> > loss of TLPs during DPC recovery and it must be able to cope with
> > whatever state the endpoint device is in after DPC recovery.
> > Is this really safe?  How does the nvme driver deal with it?
> 
> During DPC recovery, in pcie_do_recovery() function, we use
> report_frozen_detected() to notify all devices attached to the port
> about the fatal error. After this notification, we expect all
> affected devices to halt its IO transactions.
> 
> Regarding state restoration, after successful recovery, we use
> report_slot_reset() to notify about the slot/link reset. So device
> drivers are expected to restore the device to working state after this
> notification.

Thanks a lot for the explanation.


> I am not sure how pure firmware DPC recovery works. Is there a platform
> which uses this combination? For firmware DPC model, spec does not clarify
> following points.
> 
> 1. Who will notify the affected device drivers to halt the IO transactions.
> 2. Who is responsible to restore the state of the device after link reset.
> 
> IMO, pure firmware DPC does not support seamless recovery. I think after it
> clears the DPC trigger status, it might expect hotplug handler be responsible
> for device recovery.
> 
> I don't want to add fix to the code path that I don't understand. This is the
> reason for extending this logic to pure firmware DPC case.

I agree, let's just declare synchronization of pciehp with
pure firmware DPC recovery as unsupported for now.


I've just submitted a refined version of my patch to the list:
https://lore.kernel.org/linux-pci/b70e19324bbdded90b728a5687aa78dc17c20306.1616921228.git.lukas@wunner.de/

If you could give this new version a whirl I'd be grateful.

This version contains more code comments and kernel-doc.

There's now a check in dpc_completed() whether the DPC Status
register contains "all ones", which can happen when a DPC-capable
hotplug port is hot-removed, i.e. for cascaded DPC-capable hotplug
ports.

I've also realized that the previous version was prone to races
which are theoretical but should nonetheless be avoided:
E.g., previously the DLLSC event was only removed from "events"
if the link is still up after DPC recovery.  However if DPC
triggers and recovers multiple times in a row, the link may
happen to be up but a new DLLSC event may have been picked up
in "pending_events" which should be ignored.  I've solved this
by inverting the logic such that DLLSC is *always* removed from
"events", and if the link is unexpectedly *down* after successful
recovery, a DLLSC event is synthesized.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ