lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 17 Mar 2016 10:53:06 -0600
From:	Alex Williamson <alex.williamson@...hat.com>
To:	David Woodhouse <dwmw2@...radead.org>
Cc:	iommu@...ts.linux-foundation.org, joro@...tes.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Ratelimit fault handler

On Tue, 15 Mar 2016 19:47:56 +0000
David Woodhouse <dwmw2@...radead.org> wrote:

> On Tue, 2016-03-15 at 10:35 -0600, Alex Williamson wrote:
> > Fault rates can easily overwhelm the console and make the system
> > unresponsive.  Ratelimit to allow an opportunity for maintenance.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@...hat.com>  
> 
> Rather than just rate-limiting the printk, I'd prefer to handle this
> explicitly. There's a bit in the context-entry which can tell the IOMMU
> not to bother raising an interrupt at all. And then we can re-enable it
> if/when the driver recovers the device. (Or perhaps just when it next
> does a mapping).

Seems like we need to keep statistics per context entry for that, are
you prepared for that sort of overhead?  IME, a device that's spewing
faults at this rate is broken to the point where it needs to be removed
from the system or is actively being tested and debugged for driver or
assignment work.  In those case, I think we want to keep reminding the
user that something is very wrong and it probably explains why the
device isn't working properly.  If the device is using the DMA API,
maybe clearing FPD on each mapping event is a way to do that, but an
IOMMU API managed device might have very long lived mapping entries.
It seems impractical to setup statistics per context entry and timers
to check back on them for things that really ought to be rare events.
My goal was only to reduce the overall impact on the system so that
it's usable when this occurs.

> We really ought to be reporting faults to drivers too, FWIW. I keep
> meaning to take a look at that.

Yes, that path has been absent for far too long.  Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ