lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161214163208.01be036f@t450s.home>
Date:   Wed, 14 Dec 2016 16:32:08 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Cao jin <caoj.fnst@...fujitsu.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, izumi.taku@...fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery

On Thu, 15 Dec 2016 01:00:06 +0200
"Michael S. Tsirkin" <mst@...hat.com> wrote:

> On Wed, Dec 14, 2016 at 03:47:43PM -0700, Alex Williamson wrote:
> > On Thu, 15 Dec 2016 00:20:20 +0200
> > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> >   
> > > On Tue, Dec 13, 2016 at 08:00:22PM -0700, Alex Williamson wrote:  
> > > > On Wed, 14 Dec 2016 03:58:17 +0200
> > > > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > >     
> > > > > On Tue, Dec 13, 2016 at 09:27:59AM -0700, Alex Williamson wrote:    
> > > > > > On Tue, 13 Dec 2016 18:12:34 +0200
> > > > > > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > > >       
> > > > > > > On Mon, Dec 12, 2016 at 08:39:48PM -0700, Alex Williamson wrote:      
> > > > > > > > On Tue, 13 Dec 2016 05:15:13 +0200
> > > > > > > > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> > > > > > > >         
> > > > > > > > > On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote:        
> > > > > > > > > > > So just don't do it then. Topology must match between host and guest,
> > > > > > > > > > > except maybe for the case of devices with host driver  (e.g. PF)
> > > > > > > > > > > which we might be able to synchronize against.          
> > > > > > > > > > 
> > > > > > > > > > We're talking about host kernel level handling here.  The host kernel
> > > > > > > > > > cannot defer the link reset to the user under the assumption that the
> > > > > > > > > > user is handling the devices in a very specific way.  The moment we do
> > > > > > > > > > that, we've lost.          
> > > > > > > > > 
> > > > > > > > > The way is same as baremetal though, so why not?        
> > > > > > > > 
> > > > > > > > How do we know this?  What if the user is dpdk?  The kernel is
> > > > > > > > responsible for maintaining the integrity of the system and devices,
> > > > > > > > not the user.
> > > > > > > >         
> > > > > > > > > And if user doesn't do what's expected, we can
> > > > > > > > > do the full link reset on close.        
> > > > > > > > 
> > > > > > > > That's exactly my point, if we're talking about multiple devices,
> > > > > > > > there's no guarantee that the close() for each is simultaneous.  If one
> > > > > > > > function is released before the other we cannot do a bus reset.  If
> > > > > > > > that device is then opened by another user before its sibling is
> > > > > > > > released, then we once again cannot perform a link reset.  I don't
> > > > > > > > think it would be reasonable to mark the released device quarantined
> > > > > > > > until the sibling is released, that would be a terrible user experience.        
> > > > > > > 
> > > > > > > Not sure why you find it so terrible, and I don't think there's another way.      
> > > > > > 
> > > > > > If we can't do it without regressing the support we currently have,
> > > > > > let's not do it at all.      
> > > > > 
> > > > > Why would we regress?  As long as there are no unrecoverable errors,
> > > > > there's no need to change behaviour at all.    
> > > > 
> > > > Currently if a fatal error occurs we allow the host to reset the
> > > > device, so to the best of our knowledge, the device is always reset.
> > > > The proposal here allows gaps where we assume a particular guest
> > > > behavior that allows the device to be returned to the host or opened by
> > > > other users without that reset.  Any plan that relies on a specific
> > > > user behavior is fundamentally wrong imo.
> > > >     
> > > > > Alex, do you have a picture of how error recovery can work in your mind?
> > > > > Your answers seem to imply you do, and these patches don't implement
> > > > > this correctly.  I'm not sure about others, but I for one am unable to
> > > > > piece it together from the comments you provide.  If yes, could you
> > > > > maybe do a short writeup of an architecture you would be comfortable
> > > > > with?    
> > > > 
> > > > Clearly I have issues with this skip-the-host-reset plan, I don't think
> > > > it works.  We cannot assume the user will do error recovery.    
> > > 
> > > Absolutely but we can defer recovery until device close.  
> > 
> > No we can't, as I've tried to describe multiple times, if the functions
> > are part of separate groups then they can be opened and closed
> > asynchronously from each other and we may not have an opportunity where
> > they are all closed together to perform a reset.
> > The only option I can
> > see for this is to quarantine the device, which as I've stated seems
> > like a really poor solution.
> >   
> > > Possibly with userspace invoking an ioctl requesting this,
> > > to make sure we don't break any legacy setups.  
> > 
> > And how do we know that user is not malicious?  How do we police
> > whether they actually perform a reset?  We can only track whether a
> > reset has occurred, which leads back to the quarantine scenario.  
> 
> I don't really know what do you call the quarantine scenario.
> Malicious users get non working devices.
> Why is this poor? 

If we were to skip the host link reset portion of recovery, we'd need
to mark the devices that would have been reset with some sort of dirty
flag.  A user performing a link reset affecting the device would clear
the dirty flag.  If a group is released by the user with devices having
the flag set, the group would be quarantined and we'd need to prevent
the devices within it from being released to the host or the group
being opened by another user.  vfio-pci would need to actively try to
reset the group whenever another group is released.

It's not a malicious user shooting themselves, it's a malicious user
causing a DoS by leaving a device in a tainted state where we cannot
reuse it.  It also inches us ever further up the complexity scale for
this feature.

> > > >  As I
> > > > stated previously we should enable the user to do error recovery
> > > > without depending on the user to do error recovery.  I'm a bit lost why
> > > > we're focusing on this approach when the v9 approach of blocking user
> > > > access to the device during the host recovery seemed like a better
> > > > solution.  I don't think I've said anything bad about that approach,
> > > > but it does need further testing and debugging.  Nobody seems to be
> > > > interested in debugging why it wasn't quite working to understand
> > > > whether that was an implementation issue or a design issue. That's
> > > > currently the leading approach from my perspective.  Thanks,
> > > > 
> > > > Alex    
> > > 
> > > Hmm this doesn't really write up the architecture. It would be
> > > really helpful to have that writeup.  
> > 
> > Perhaps the patch submitters can create one.  
> 
> You already spent so many words on this - won't it
> be easier for you to just write up what you consider
> the right thing to do?
>
> > > Do I guess right that what v9 tried to do is:
> > > - block future userspace accesses
> > > - do full reset
> > > - re-enable userspace accesses
> > > 
> > > ?
> > > 
> > > The obvious issue with that is that device loses its state apparently at
> > > random, and this is not something that happens on baremetal, ever.
> > > 
> > > Thus I don't see how this can work without some PV code in guest.  
> > 
> > So there is no link error that can occur on hardware where the device
> > becomes inaccessible?  I'm pretty sure I've experienced cases
> > otherwise.  
> 
> There are errors like this (losing link) but this is IMO a major error
> which isn't covered by AER.
> 
> It's possible that some guests have AER code that happens to recover
> from lost link, but IMO that's never guaranteed.
> 
> Converting an uncorrectable error like a lost write into link loss
> means escalating the problem to a whole different level.

If a host induced link reset cannot work, then we're at an impasse and
I have no recommended architecture to write-up.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ