linux-kernel - Re: [PATCH] vfio/pci: Support error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161204083047.7e715b09@t450s.home>
Date:   Sun, 4 Dec 2016 08:30:47 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Cao jin <caoj.fnst@...fujitsu.com>
Cc:     <linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
        <izumi.taku@...fujitsu.com>, <mst@...hat.com>
Subject: Re: [PATCH] vfio/pci: Support error recovery

On Sun, 4 Dec 2016 20:16:42 +0800
Cao jin <caoj.fnst@...fujitsu.com> wrote:

> On 12/01/2016 10:55 PM, Alex Williamson wrote:
> > On Thu, 1 Dec 2016 21:40:00 +0800  
> 
> >>> If an AER fault occurs and the user doesn't do a reset, what
> >>> happens when that device is released and a host driver tries to make
> >>> use of it?  The user makes no commitment to do a reset and there are
> >>> only limited configurations where we even allow the user to perform a
> >>> reset.
> >>>     
> >>
> >> Limited? Do you mean the things __pci_dev_reset() can do?  
> > 
> > I mean that there are significant device and guest configuration
> > restrictions in order to support AER.  For instance, all the functions
> > of the slot need to appear in a PCI-e topology in the guest with all
> > the functions in the right place such that a guest bus reset translates
> > into a host bus reset.  The physical functions cannot be split between
> > guests even if IOMMU isolation would otherwise allow it.  The user
> > needs to explicitly enable AER support for the devices.  A VM need to
> > be specifically configured for AER support in order to set any sort of
> > expectations of a guest directed bus reset, let alone a guarantee that
> > it will happen.  So all the existing VMs, where functions are split
> > between guests, or the topology isn't exactly right, or AER isn't
> > enabled see a regression from the above change as the device is no
> > longer reset.
> >   
> 
> I am not clear why set these restrictions in the current design. I take
> a glance at older versions of qemu's patchset, their thoughts is:
> translate a guest bus reset into a host bus reset(Which is
> unreasonable[*] to me). And I guess, that's the *cause* of these
> restrictions?  Is there any other stories behind these restrictions?
> 
> [*] In physical world, set bridge's secondary bus reset would send
> hot-reset TLP to all functions below, trigger every device's reset
> separately. Emulated device should behave the same, means just using
> each device's DeviceClass->reset method.

Are you trying to say that an FLR is equivalent to a link reset?
Please go read the previous discussions, especially if you're sending
patches you don't believe in.  Thanks,

Alex