[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230617083108.GB26803@wunner.de>
Date: Sat, 17 Jun 2023 10:31:08 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org,
Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
Subject: Re: [PATCH v1] PCI: pciehp: Make sure DPC trigger status is reset in
PDC handler
On Fri, Jun 16, 2023 at 04:27:33PM -0700, Sathyanarayanan Kuppuswamy wrote:
> On 6/16/23 2:06 AM, Lukas Wunner wrote:
> > On Thu, Jun 15, 2023 at 04:03:54PM -0700, Sathyanarayanan Kuppuswamy wrote:
> >> On 6/15/23 11:35 AM, Lukas Wunner wrote:
> >>> On Wed, Jun 14, 2023 at 11:25:59PM -0700, Kuppuswamy Sathyanarayanan wrote:
> >>>> During the EDR-based DPC recovery process, for devices with persistent
> >>>> issues, the firmware may choose not to handle the DPC error and leave
> >>>> the port in DPC triggered state. In such scenarios, if the user
> >>>> replaces the faulty device with a new one, the OS is expected to clear
> >>>> the DPC trigger status in the hotplug error handler to enable the new
> >>>> device enumeration.
> > [...]
> >>> pciehp_unconfigure_device() seems like a more appropriate place to me.
> >>
> >> I initially thought to add it there. Spec also recommends clearing it
> >> when removing the device. But I wasn't sure if pciehp_unconfigure_device()
> >> would be called only during device removal.
> >
> > It is.
>
> Do you know how pciehp_unconfigure_device() will be called when the device is
> removed? Is it due to a DLLSC event or a PDC state change? If it is DLLSC,
> pciehp_unconfigure_device() may not be called because we ignore the DLLSC
> event if there is an active DPC trigger.
A DLLSC event is only ignored by pciehp if successful recovery
from DPC was possible. In that case, the device in the slot is
still the *same*.
And you're saying in the commit message that you seek to fix the
case when "the user replaces the faulty device with a *new* one".
So for your goal, ignored DLLSC events shouldn't be relevant.
As to your question, pciehp_ist() doesn't really differentiate
between PDC and DLLSC events. If either of them occurs,
the slot is brought down through pciehp_handle_presence_or_link_change().
And that will end up in pciehp_unconfigure_device().
I think right at the end of that function, after
pci_unlock_rescan_remove() would be a good place to perform
cleanup of the hotplug port before a newly hotplugged device
can be handled.
Smita's clearing of ARI Forwarding Enable, AtomicOp Requester Enable
and 10-Bit Tag Requester Enable could be done as well at the end of
pciehp_unconfigure_device().
> > We need to differentiate between two cases:
> >
> > (1) DPC handled by firmware, hotplug handled by OS:
> >
> > In this case clearing DPC trigger status from pciehp device removal
> > code path seems reasonable. But it must be constrained to
> > !host_bridge->native_dpc.
>
> Agree.
The crucial thing here is that the dpc driver isn't bound to the
port if DPC is handled by firmware. Hence there's no chance of
racing with dpc_reset_link() and it's safe to clear DPC trigger
status from pciehp.
> > (2) DPC handled by OS:
> >
> > In this case clearing DPC trigger status from pciehp could race with
> > the dpc interrupt handler so must not be done. Instead, I recommend
>
> If we clear the DPC trigger status in the DLLSC state change handler, I
> agree there could be a race. However, if we clear the DPC trigger in the
> PDC state change handler, I believe it will not race because the device
> has already been removed. Is my understanding correct?
Unfortunately it's not guaranteed that dpc_reset_link() is finished
when pciehp brings down the slot.
So clearing DPC trigger status can only be done safely from the dpc
driver itself (if DPC is handled natively by the OS).
Thanks,
Lukas
Powered by blists - more mailing lists