linux-kernel - Re: [PATCH] vfio/pci: Support error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161206083556.23be6ee5@t450s.home>
Date:   Tue, 6 Dec 2016 08:35:56 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Cao jin <caoj.fnst@...fujitsu.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <izumi.taku@...fujitsu.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery

On Tue, 6 Dec 2016 18:46:04 +0800
Cao jin <caoj.fnst@...fujitsu.com> wrote:

> On 12/06/2016 12:59 PM, Alex Williamson wrote:
> > On Tue, 6 Dec 2016 05:55:28 +0200
> > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> >   
> >> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:  
> >>> If you're going to take the lead for these AER patches, I would
> >>> certainly suggest that understanding the reasoning behind the bus reset
> >>> behavior is a central aspect to this series.  This effort has dragged
> >>> out for nearly two years and I apologize, but I don't really have a lot
> >>> of patience for rehashing some of these issues if you're not going to
> >>> read the previous discussions or consult with your colleagues to
> >>> understand how we got to this point.  If you want to challenge some of
> >>> the design points, that's great, it could use some new eyes, but please
> >>> understand how we got here first.    
> >>
> >> Well I'm guessing Cao jin here isn't the only one not
> >> willing to plough through all historical versions of the patchset
> >> just to figure out the motivation for some code.
> >>
> >> Including a summary of a high level architecture couldn't hurt.
> >>
> >> Any chance of writing such?  Alternatively, we can try to build it as
> >> part of this thread.  Shouldn't be hard as it seems somewhat
> >> straight-forward on the surface:
> >>
> >> - detect link error on the host, don't reset link as we would normally do  
> > 
> > This is actually a new approach that I'm not sure I agree with.  By
> > skipping the host directed link reset, vfio is taking responsibility
> > for doing this, but then we just assume the user will do it.  I have
> > issues with this.
> > 
> > The previous approach was to use the error detected notifier to block
> > access to the device, allowing the host to perform the link reset.  A
> > subsequent notification in the AER process released the user access
> > which allowed the user AER process to proceed.  This did result in both
> > a host directed and a guest directed link reset, but other than
> > coordinating the blocking of the user process during host reset, that
> > hasn't been brought up as an issue previously.
> >   
> 
> Tests on previous versions didn't bring up issues as I find, I think
> that is because we didn't test it completely. As I know, before August
> of this year, we didn't have cable connected to NIC, let alone
> connecting NIC to gateway.

Lack of testing has been a significant issue throughout the development
of this series.

> Even if I fixed the guest oops issue in igb driver that Alex found in
> v9, v9 still cannot work in my test. And in my test, disable link
> reset(in host) in aer core for vfio-pci is the most significant step to
> get my test passed.

But is it the correct step?  I'm not convinced.  Why did blocking guest
access not work?  How do you plan to manage vfio taking the
responsibility to perform a bus reset when you don't know whether QEMU
is the user of the device or whether the user supports AER recovery?
 
> >> - report link error to guest
> >> - detect link reset request from guest
> >> - reset link on host
> >>
> >> Since link reset will reset all devices behind it, for this to work we
> >> need same set of devices behind the link in host and guest.  Enforcing
> >> this would be nice to have.  
> > 
> > This is a pretty significant complication and I think it's a
> > requirement.  This is the heart of why we have an AER vfio-pci device
> > option and why we require that QEMU should fail to initialize the
> > device if AER is enabled in an incompatible configuration.  If a user
> > cannot be sure that AER is enabled on a device, it's pointless to
> > bother implementing it, though honestly I question the value of it in
> > the VM altogether given configuration requirements (ie. are users
> > going to accept the reason that all the ports of a device need to be
> > assigned to a single guest for guest-based AER recovery when they were
> > previously able to assign each port to a separate VM?).
> >    
> >> - as link now might end up in bad state, reset
> >>   it when device is unassigned  
> > 
> > This is also a new aspect for the approach here, previously we allowed
> > the host directed link reset so we could assume the device was
> > sufficiently recovered.  In the proposal here, the AER core skips any
> > devices bound to vfio-pci, but vfio can't guarantee that we can do a
> > bus reset on them.  PCI bus isolation is not accounted for in DMA
> > isolation, which is the basis for IOMMU groups.  A bus can host
> > multiple IOMMU groups, each of which may have a different user.  Only
> > in a very specific configuration can vfio do a bus reset.
> >    
> >> Any details I missed?  
> > 
> > AIUI, the critical feature is that the guest needs to be able to reset
> > the device link, all the other design elements play out from the
> > logical expansion of that feature.  It means that a guest bus reset
> > needs to translate to a host bus reset, which means that all of the
> > affected host devices need to be accounted for and those that are
> > assigned to the guest need to be affected in the guest in the same
> > way.  QEMU must enforce this configuration or else a user cannot know
> > the result of a AER fault, ie. will it cause a VMSTOP condition or is
> > the fault forwarded to the guest.  The required configuration
> > restrictions are quite involved, therefore we can't simply require this
> > of all configurations, so a vfio-pci device option is added.  The
> > coordination of the host directed reset with the guest directed reset
> > was only a complications discovered within the last few revisions of
> > the series.  As noted above, the previous solution to this was to
> > attempt to block access to the device while the host reset proceeds.
> > 
> > Clearly it's a little disconcerting if we throw all of that away and
> > simply assume that an FLR is sufficient to reset the device when it
> > seems like link issues might be a nontrivial source of AER faults.  If
> > FLR is sufficient, then why does the core AER handling code in the
> > kernel not simply do this?  Thanks,
> >   
> 
> I agree with the points here.  Now I understand why  translate a guest
> link reset to host link reset[*], and FLR shouldn't be equivalent to
> link reset, but the PITY is, we can't trigger a real fatal error to see
> if a FLR is sufficient to reset the device.

In this case, it's not a matter of finding a scenario where it works.
Using FLR to recover from a fatal error would need to be supported by
verbiage in a specification.  I'm not interested in cases where it
happens to work on single device for a certain type of error.  The link
reset is the bare metal mechanism for recovering from a fatal error and
it should be our mechanism as well unless there's spec wording to
support another approach.  Thanks,

Alex