linux-kernel - Re: [PATCH v4 14/22] iommu: handle page response timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 30 Apr 2018 10:54:24 -0700
From:   Jacob Pan <jacob.jun.pan@...ux.intel.com>
To:     Jean-Philippe Brucker <jean-philippe.brucker@....com>
Cc:     "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joerg Roedel <joro@...tes.org>,
        David Woodhouse <dwmw2@...radead.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Alex Williamson <alex.williamson@...hat.com>,
        Rafael Wysocki <rafael.j.wysocki@...el.com>,
        "Liu, Yi L" <yi.l.liu@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        Raj Ashok <ashok.raj@...el.com>,
        Christoph Hellwig <hch@...radead.org>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        jacob.jun.pan@...ux.intel.com
Subject: Re: [PATCH v4 14/22] iommu: handle page response timeout

On Mon, 30 Apr 2018 11:58:10 +0100
Jean-Philippe Brucker <jean-philippe.brucker@....com> wrote:

> On 25/04/18 16:37, Jacob Pan wrote:
> >> In the other cases (unsupported PRI or rogue guest) then disabling
> >> PRI using a FAILURE status might be the right thing to do. However,
> >> assuming the device follows the PCI spec it will stop sending page
> >> requests once there are as many PPRs in flight as the allocated
> >> credit.
> >>  
> > Agreed, here I am not taking any actions. There may be need to drain
> > in-fly requests.  
> 
> Right, as long as we first ensure that no new fault is generated (by
> using a Response Failure). Though in my opinion not taking action
> might be the safest option :)
> 
> Another thought: currently the comment in iommu.h says
> "@IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent
> faults from this device if possible. This is "Response Failure" in
> PCI PRI."
> 
> I wonder if we should simply say "Drop all subsequent faults from the
> device". Even if the PCI device doesn't properly implement PRI, the
> IOMMU driver should set a "PRI disabled" bit in the device data that
> prevents it from from reporting new faults and flooding the queue.
> Anyway, it's a small detail that could go in a future patch series.
> 
right, we should disable PRI and let future PRQ response pending on
re-enabling of PRI on the device. I will leave that to future
enhancement.
> >> If there isn't any possibility of memory leak or abusing
> >> resources, I don't think it's our problem that the guest is
> >> excessively slow at handling page requests. Setting an upper bound
> >> to page request latency might do more harm than good. Ensuring
> >> that devices respect the number of allocated in-flight PPRs is
> >> more important in my opinion. 
> > How about we have a really long timeout, e.g. 1 min similar to
> > device invalidate response timeout in ATS spec., just for basic
> > safety and diagnosis. Optionally, we could have quota in parallel.  
> 
> I agree that for development a timeout is useful. It might be worth
> adding it as an option to the IOMMU module instead of a define.
> Perhaps a number of seconds, 10 being the default and 0 disabling the
> timeout? Otherwise we would probably end up with a succession of
> patches incrementing the timeout by arbitrary values, if people find
> it inconvenient.
> 
make sense. will do.