[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211012231201.xj7fvfgvpde5wwrl@pali>
Date: Wed, 13 Oct 2021 01:12:01 +0200
From: Pali Rohár <pali@...nel.org>
To: Naveen Naidu <naveennaidu479@...il.com>
Cc: Lukas Wunner <lukas@...ner.de>, bhelgaas@...gle.com,
linux-kernel-mentees@...ts.linuxfoundation.org,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Amey Narkhede <ameynarkhede03@...il.com>
Subject: Re: [PATCH 16/22] PCI: pciehp: Use RESPONSE_IS_PCI_ERROR() to check
read from hardware
On Tuesday 12 October 2021 21:35:13 Naveen Naidu wrote:
> On 11/10, Lukas Wunner wrote:
> > On Mon, Oct 11, 2021 at 11:37:33PM +0530, Naveen Naidu wrote:
> > > An MMIO read from a PCI device that doesn't exist or doesn't respond
> > > causes a PCI error. There's no real data to return to satisfy the
> > > CPU read, so most hardware fabricates ~0 data.
> > >
> > > Use RESPONSE_IS_PCI_ERROR() to check the response we get when we read
> > > data from hardware.
> >
> > Actually what happens is that PCI read transactions *time out*,
> > so the host controller fabricates a response.
> >
>
> Ah! yes. Now that I look at it, RESPONSE_IS_PCI_TIMEOUT() does indeed
> seem like a better option to RESPONSE_IS_PCI_ERROR(), since it's more
> specfic and depicts the actual condition.
This is not fully correct. 0xffffffff is returned when some error
happens. It does not have to be timeout error. Errors like Unsupported
Request, Completer Abort or Configuration Request Retry Status (when
CRSSVE bit is disabled) are also reported as 0xffffffff and they do not
represent timeout. For example Unsupported Request is returned when you
try to read from non-existent device behind some PCIe switch.
Also pci-aardvark.c fabricates value 0xffffffff when trying to read from
config space below the PCIe Root Port when PCIe link is not up.
And I have seen that Completer Abort was returned by PCIe switch when
switch itself did not received reply from device below switch. So it
means that controller can receive some reply from other device even when
no real reply was sent. Which means that timeout can be reported by some
other message.
So I think that generic PCI_ERROR is the best name. You do not know what
really happened (only some controller drivers can provide additional
information, it does not have any standard HW<-->OS API) and application
logic must decide how to process error.
> I'll wait for sometime and see if others have any objection/a better
> name for the macro and then redo the patch with that.
>
> Thank you very much for the review ^^
>
> > By contrast, a PCI *error* usually denotes an Uncorrectable or
> > Correctable Error as specified in section 6.2.2 of the PCIe Base Spec.
> >
> > Thus something like RESPONSE_IS_PCI_TIMEOUT() or IS_PCI_TIMEOUT() would
> > probably be more appropriate. I'll leave the exact bikeshed color for
> > others to decide. :-)
> >
> >
> > > Signed-off-by: Naveen Naidu <naveennaidu479@...il.com>
> > > ---
> > > drivers/pci/hotplug/pciehp_hpc.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > Acked-by: Lukas Wunner <lukas@...ner.de>
Powered by blists - more mailing lists