linux-kernel - Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devices assigned to KVM guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121120134104.GI27378@stefanha-thinkpad.redhat.com>
Date:	Tue, 20 Nov 2012 14:41:04 +0100
From:	Stefan Hajnoczi <stefanha@...il.com>
To:	"Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@...com>
Cc:	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devices
 assigned to KVM guests

On Tue, Nov 20, 2012 at 06:31:48AM +0000, Pandarathil, Vijaymohan R wrote:
> Add support for error containment when a PCI pass-thru device assigned to a KVM
> guest encounters an error. This is for PCIe devices/drivers that support AER
> functionality. When the OS is notified of an error in a device either
> through the firmware first approach or through an interrupt handled by the AER
> root port driver, concerned subsystems are notified by invoking callbacks
> registered by these subsystems. The device is also marked as tainted till the
> corresponding driver recovery routines are successful. 
> 
> KVM module registers for a notification of such errors. In the KVM callback
> routine, a global counter is incremented to keep track of the error
> notification. Before each CPU enters guest mode to execute guest code,
> appropriate checks are done to see if the impacted device belongs to the guest
> or not. If the device belongs to the guest, qemu hypervisor for the guest is
> informed and the guest is immediately brought down, thus preventing or
> minimizing chances of any bad data being written out by the guest driver
> after the device has encountered an error.

I'm surprised that the hypervisor would shut down the guest when PCIe
AER kicks in for a pass-through device.  Shouldn't we pass the AER event
into the guest and deal with it there?

The equivalent to this policy on physical hardware would be that the CPU
is reset or the machine is powered down on AER.  That doesn't sound
right.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/