linux-kernel - Re: [PATCH] vfio/pci: Support error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161205215949.6b09bc0f@t450s.home>
Date:   Mon, 5 Dec 2016 21:59:49 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Cao jin <caoj.fnst@...fujitsu.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, izumi.taku@...fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery

On Tue, 6 Dec 2016 05:55:28 +0200
"Michael S. Tsirkin" <mst@...hat.com> wrote:

> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote:
> > If you're going to take the lead for these AER patches, I would
> > certainly suggest that understanding the reasoning behind the bus reset
> > behavior is a central aspect to this series.  This effort has dragged
> > out for nearly two years and I apologize, but I don't really have a lot
> > of patience for rehashing some of these issues if you're not going to
> > read the previous discussions or consult with your colleagues to
> > understand how we got to this point.  If you want to challenge some of
> > the design points, that's great, it could use some new eyes, but please
> > understand how we got here first.  
> 
> Well I'm guessing Cao jin here isn't the only one not
> willing to plough through all historical versions of the patchset
> just to figure out the motivation for some code.
> 
> Including a summary of a high level architecture couldn't hurt.
> 
> Any chance of writing such?  Alternatively, we can try to build it as
> part of this thread.  Shouldn't be hard as it seems somewhat
> straight-forward on the surface:
> 
> - detect link error on the host, don't reset link as we would normally do

This is actually a new approach that I'm not sure I agree with.  By
skipping the host directed link reset, vfio is taking responsibility
for doing this, but then we just assume the user will do it.  I have
issues with this.

The previous approach was to use the error detected notifier to block
access to the device, allowing the host to perform the link reset.  A
subsequent notification in the AER process released the user access
which allowed the user AER process to proceed.  This did result in both
a host directed and a guest directed link reset, but other than
coordinating the blocking of the user process during host reset, that
hasn't been brought up as an issue previously.

> - report link error to guest
> - detect link reset request from guest
> - reset link on host
> 
> Since link reset will reset all devices behind it, for this to work we
> need same set of devices behind the link in host and guest.  Enforcing
> this would be nice to have.

This is a pretty significant complication and I think it's a
requirement.  This is the heart of why we have an AER vfio-pci device
option and why we require that QEMU should fail to initialize the
device if AER is enabled in an incompatible configuration.  If a user
cannot be sure that AER is enabled on a device, it's pointless to
bother implementing it, though honestly I question the value of it in
the VM altogether given configuration requirements (ie. are users
going to accept the reason that all the ports of a device need to be
assigned to a single guest for guest-based AER recovery when they were
previously able to assign each port to a separate VM?).
 
> - as link now might end up in bad state, reset
>   it when device is unassigned

This is also a new aspect for the approach here, previously we allowed
the host directed link reset so we could assume the device was
sufficiently recovered.  In the proposal here, the AER core skips any
devices bound to vfio-pci, but vfio can't guarantee that we can do a
bus reset on them.  PCI bus isolation is not accounted for in DMA
isolation, which is the basis for IOMMU groups.  A bus can host
multiple IOMMU groups, each of which may have a different user.  Only
in a very specific configuration can vfio do a bus reset.
 
> Any details I missed?

AIUI, the critical feature is that the guest needs to be able to reset
the device link, all the other design elements play out from the
logical expansion of that feature.  It means that a guest bus reset
needs to translate to a host bus reset, which means that all of the
affected host devices need to be accounted for and those that are
assigned to the guest need to be affected in the guest in the same
way.  QEMU must enforce this configuration or else a user cannot know
the result of a AER fault, ie. will it cause a VMSTOP condition or is
the fault forwarded to the guest.  The required configuration
restrictions are quite involved, therefore we can't simply require this
of all configurations, so a vfio-pci device option is added.  The
coordination of the host directed reset with the guest directed reset
was only a complications discovered within the last few revisions of
the series.  As noted above, the previous solution to this was to
attempt to block access to the device while the host reset proceeds.

Clearly it's a little disconcerting if we throw all of that away and
simply assume that an FLR is sufficient to reset the device when it
seems like link issues might be a nontrivial source of AER faults.  If
FLR is sufficient, then why does the core AER handling code in the
kernel not simply do this?  Thanks,

Alex