[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181109072953.ox7qfpnibb7drmf6@wunner.de>
Date: Fri, 9 Nov 2018 08:29:53 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Bjorn Helgaas <helgaas@...nel.org>,
Alexandru Gagniuc <mr.nuke.me@...il.com>,
linux-pci@...r.kernel.org, keith.busch@...el.com,
alex_gagniuc@...lteam.com, austin_bolen@...l.com,
shyam_iyer@...l.com, linux-kernel@...r.kernel.org,
Jonathan Derrick <jonathan.derrick@...el.com>,
Russell Currey <ruscur@...sell.cc>,
Sam Bobroff <sbobroff@...ux.ibm.com>,
Oliver O'Halloran <oohall@...il.com>,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is
disconnected
On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
> On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> > I'm having second thoughts about this. One thing I'm uncomfortable
> > with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
>
> I think my stance always has been that this call is not good at all
> because once you call it you never really know if it is still true as
> the device could have been removed right afterward.
>
> So almost any code that relies on it is broken, there is no locking and
> it can and will race and you will loose.
Hm, to be honest if that's your impression I think you must have missed a
large portion of the discussion we've been having over the past 2 years.
Please consider reading this LWN article, particularly the "Surprise
removal" section, to get up to speed:
https://lwn.net/Articles/767885/
You seem to be assuming that all we care about is the *return value* of
an mmio read. However a transaction to a surprise removed device has
side effects beyond returning all ones, such as a Completion Timeout
which, with thousands of transactions in flight, added up to many seconds
to handle removal of an NVMe array and occasionally caused MCEs.
It is not an option to just blindly carry out device accesses even though
it is known the device is gone, Completion Timeouts be damned.
However there is more to it than just Completion Timeouts, this is all
detailed in the LWN article.
Thanks,
Lukas
Powered by blists - more mailing lists