lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 11 Jun 2015 16:40:12 +0100
From:	David Woodhouse <dwmw2@...radead.org>
To:	"Li, Zhen-Hua" <zhen-hual@...com>
Cc:	indou.takao@...fujitsu.com, bhe@...hat.com, joro@...tes.org,
	vgoyal@...hat.com, dyoung@...hat.com, tom.vaden@...com,
	rwright@...com, linux-pci@...r.kernel.org,
	kexec@...ts.infradead.org, iommu@...ts.linux-foundation.org,
	lisa.mitchell@...com, linux-kernel@...r.kernel.org,
	doug.hatch@...com, ishii.hironobu@...fujitsu.com,
	bhelgaas@...gle.com, billsumnerlinux@...il.com, li.zhang6@...com
Subject: Re: [PATCH v10 0/10] iommu/vt-d: Fix intel vt-d faults in kdump
 kernel

On Fri, 2015-04-10 at 16:42 +0800, Li, Zhen-Hua wrote:
> This patchset is an update of Bill Sumner's patchset, implements a fix for:
> If a kernel boots with intel_iommu=on on a system that supports intel vt-d, 
> when a panic happens, the kdump kernel will boot with these faults:

But, in the general case, it *does* boot.

There are two cases where it doesn't actually boot, and those are the
interesting ones.

Firstly, a device just keeps generating faults and we die in an
interrupt storm, reporting the same fault over and over again. That can
actually happen without kdump/kexec and the correct fix for that is to
have rate-limiting, disable fault reporting for the offending device
after too many are seen, and then eventually to tie it in to the PCIe
error handling as has been discussed elsewhere.

Secondly, there are devices which do not correctly respond to a
hardware reset. This is broken hardware, and if we really have to copy
the old contexts from the crashed kernel to work around it then I'd
like it to be on a blacklist basis — we do it only for hardware which
is *known* to be broken in this way.

(There's also some cases where the device driver doesn't even *try* to
reset the hardware and just assumes it'll find it in a sane state as
the BIOS or a cleanly shut down kexec would have left it. In those
cases of course we can just fix the driver).

I don't much like the idea of doing this context copy for *all*
hardware. That's masking hardware issues with reset that we really
*ought* to be finding.

I believe that most of the offending hardware is HP's; they like to do
the most, erm, "interesting" things with odd hardware and RMRRs and
stuff. So Zhen-Hua would you be able to provide the list of broken
devices that HP has shipped, for the purpose of such a blacklist?

I assume you've already contacted the hardware folks responsible and
insisted that their devices are fixed to be resettable already, right?

-- 
dwmw2

Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5691 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ