linux-kernel - Re: [ 102/127] iommu/amd: Workaround for ERBT1312

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130628223701.78d17df2@dualc.maya.org>
Date:	Fri, 28 Jun 2013 22:37:01 +0200
From:	Andreas Hartmann <andihartmann@...19freenet.de>
To:	Alex Williamson <alex.williamson@...hat.com>,
	Joerg Roedel <joro@...tes.org>
Cc:	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [ 102/127] iommu/amd: Workaround for ERBT1312

Alex Williamson wrote:
> On Fri, 2013-06-28 at 18:11 +0200, Andreas Hartmann wrote:
>> Hello Joerg, hello Alex,
>>
>> the subsequent patch and the patch "iommu/amd: Re-enable IOMMU event log
>> interrupt after handling." 925fe08bce38d1ff052fe2209b9e2b8d5fbb7f98
>> spread /var/log/messages with the following line (> 700 lines/second)
>> right after loading vfio:
>>
>> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x0000 address=0x000000fdf9103300 flags=0x0600]
> 
> That's interesting, I PXE boot my system from one NIC then use a
> different NIC for the iSCSI root.  The PXE boot NIC now screams like
> this, _until_ I attach it to vfio, then it quiets down.

Hmm, I just remembered an active workaround I implemented to "resolve"
an error like this when starting my VM to passthrough my intel pci
ethernet device since I applied a new kvm version:


qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to set iommu for
container: Device or resource busy

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to setup container
for group 12

qemu-kvm: -device vfio-pci,host=06:06.0: vfio: failed to get group 12

qemu-kvm: -device vfio-pci,host=06:06.0: Device 'vfio-pci' could not be
initialized


The workaround was to bind the individual multifunction devices during
boot one time to vfio and release them after 2 seconds again and rebind
them to the original drivers as they where bound before (if it was bound
to any).

I did this with a script beginning like this:

#!/bin/sh
modprobe vfio-pci

echo "1002 4385" > /sys/bus/pci/drivers/vfio-pci/new_id
echo 0000:00:14.0 > /sys/bus/pci/devices/0000:00:14.0/driver/unbind
echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/bind
...

sleep 2

echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo "1002 4385" > /sys/bus/pci/drivers/vfio-pci/remove_id
...

The logs in messages:

Jun 28 15:54:12 . kernel: [   48.860147] VFIO - User Level meta-driver version: 0.3
Jun 28 15:54:12 . kernel: [   48.875243] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.0 domain=0x0000 address=0x000000fdf9103300 flags=0x0600]
...

Therefore, the logoutput most probably started after device 14.0 was
bound to vfio. If it would have started after removing vfio, I would
have expected 2 seconds between the start messages of vfio and the first
occurrence of the IO_PAGE_FAULT.

Today, I'm using kvm 1.3.1 and it isn't necessary to use the complete
workaround anymore. It is enough to bind / unbind the pci bridge
as described above before starting the VM with the passed through pci
ethernet device.
Because I now don't touch the 14.0 device any more, the IO_PAGE_FAULT
messages disappeared completely.

@Joerg:
Anyway, I'm going to test your provided patch tomorrow!

BTW: what does it mean: IO_PAGE_FAULT - what do I have to expect if I
see this message?



Thanks,
regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/