[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180430105424.3f12f792@jacob-builder>
Date: Mon, 30 Apr 2018 10:54:24 -0700
From: Jacob Pan <jacob.jun.pan@...ux.intel.com>
To: Jean-Philippe Brucker <jean-philippe.brucker@....com>
Cc: "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
David Woodhouse <dwmw2@...radead.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Alex Williamson <alex.williamson@...hat.com>,
Rafael Wysocki <rafael.j.wysocki@...el.com>,
"Liu, Yi L" <yi.l.liu@...el.com>,
"Tian, Kevin" <kevin.tian@...el.com>,
Raj Ashok <ashok.raj@...el.com>,
Christoph Hellwig <hch@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH v4 14/22] iommu: handle page response timeout
On Mon, 30 Apr 2018 11:58:10 +0100
Jean-Philippe Brucker <jean-philippe.brucker@....com> wrote:
> On 25/04/18 16:37, Jacob Pan wrote:
> >> In the other cases (unsupported PRI or rogue guest) then disabling
> >> PRI using a FAILURE status might be the right thing to do. However,
> >> assuming the device follows the PCI spec it will stop sending page
> >> requests once there are as many PPRs in flight as the allocated
> >> credit.
> >>
> > Agreed, here I am not taking any actions. There may be need to drain
> > in-fly requests.
>
> Right, as long as we first ensure that no new fault is generated (by
> using a Response Failure). Though in my opinion not taking action
> might be the safest option :)
>
> Another thought: currently the comment in iommu.h says
> "@IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from this device if possible. This is "Response Failure" in
> PCI PRI."
>
> I wonder if we should simply say "Drop all subsequent faults from the
> device". Even if the PCI device doesn't properly implement PRI, the
> IOMMU driver should set a "PRI disabled" bit in the device data that
> prevents it from from reporting new faults and flooding the queue.
> Anyway, it's a small detail that could go in a future patch series.
>
right, we should disable PRI and let future PRQ response pending on
re-enabling of PRI on the device. I will leave that to future
enhancement.
> >> If there isn't any possibility of memory leak or abusing
> >> resources, I don't think it's our problem that the guest is
> >> excessively slow at handling page requests. Setting an upper bound
> >> to page request latency might do more harm than good. Ensuring
> >> that devices respect the number of allocated in-flight PPRs is
> >> more important in my opinion.
> > How about we have a really long timeout, e.g. 1 min similar to
> > device invalidate response timeout in ATS spec., just for basic
> > safety and diagnosis. Optionally, we could have quota in parallel.
>
> I agree that for development a timeout is useful. It might be worth
> adding it as an option to the IOMMU module instead of a define.
> Perhaps a number of seconds, 10 being the default and 0 disabling the
> timeout? Otherwise we would probably end up with a succession of
> patches incrementing the timeout by arbitrary values, if people find
> it inconvenient.
>
make sense. will do.
Powered by blists - more mailing lists