lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 20 Nov 2012 14:09:46 +0000
From:	"Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@...com>
To:	Stefan Hajnoczi <stefanha@...il.com>
CC:	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devices
 assigned to KVM guests



> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@...il.com]
> Sent: Tuesday, November 20, 2012 5:41 AM
> To: Pandarathil, Vijaymohan R
> Cc: kvm@...r.kernel.org; linux-pci@...r.kernel.org; qemu-devel@...gnu.org;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru
> devices assigned to KVM guests
> 
> On Tue, Nov 20, 2012 at 06:31:48AM +0000, Pandarathil, Vijaymohan R wrote:
> > Add support for error containment when a PCI pass-thru device assigned to
> a KVM
> > guest encounters an error. This is for PCIe devices/drivers that support
> AER
> > functionality. When the OS is notified of an error in a device either
> > through the firmware first approach or through an interrupt handled by
> the AER
> > root port driver, concerned subsystems are notified by invoking callbacks
> > registered by these subsystems. The device is also marked as tainted till
> the
> > corresponding driver recovery routines are successful.
> >
> > KVM module registers for a notification of such errors. In the KVM
> callback
> > routine, a global counter is incremented to keep track of the error
> > notification. Before each CPU enters guest mode to execute guest code,
> > appropriate checks are done to see if the impacted device belongs to the
> guest
> > or not. If the device belongs to the guest, qemu hypervisor for the guest
> is
> > informed and the guest is immediately brought down, thus preventing or
> > minimizing chances of any bad data being written out by the guest driver
> > after the device has encountered an error.
> 
> I'm surprised that the hypervisor would shut down the guest when PCIe
> AER kicks in for a pass-through device.  Shouldn't we pass the AER event
> into the guest and deal with it there?

Agreed. That would be the ideal behavior and is planned in a future patch.
Lack of control over the capabilities/type of the OS/drivers running in 
the guest is also a concern in passing along the event to the guest.

My understanding is that in the current implementation of Linux/KVM, these 
errors are not handled at all and can potentially cause a guest hang or 
crash or even data corruption depending on the implementation of the guest
driver for the device. As a first step, these patches make the behavior 
better by doing error containment with a predictable behavior when such
errors occur. 

> 
> The equivalent to this policy on physical hardware would be that the CPU
> is reset or the machine is powered down on AER.  That doesn't sound
> right.
> 
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ