lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 2 Oct 2014 09:09:50 -0600
From:	Bjorn Helgaas <bhelgaas@...gle.com>
To:	"Li, ZhenHua" <zhen-hual@...com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	Joerg Roedel <joro@...tes.org>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	Bruce Allan <bruce.w.allan@...el.com>,
	Carolyn Wyborny <carolyn.wyborny@...el.com>,
	Don Skidmore <donald.c.skidmore@...el.com>,
	Greg Rose <gregory.v.rose@...el.com>,
	Alex Duyck <alexander.h.duyck@...el.com>,
	John Ronciak <john.ronciak@...el.com>,
	Mitch Williams <mitch.a.williams@...el.com>,
	Linux NICS <linux.nics@...el.com>,
	"e1000-devel@...ts.sourceforge.net" 
	<e1000-devel@...ts.sourceforge.net>, linda.knippers@...com
Subject: Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card

On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@...com> wrote:
> Add Joerg to CC list. For it is also related to iommu module.
>
> Joerg,
> There was a try for this dmar fault,
>         https://lkml.org/lkml/2014/8/18/118
>
> This patch is trying to fix the same thing.
>
>
> Zhenhua
>
> On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
>>
>> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
>> crashed and the kdump kernel boots with intel_iommu=on, there may be some
>> unexpected DMA requests on this adapter, which will cause DMA Remapping
>> faults like:
>>      dmar: DRHD: handling fault status reg 102
>>      dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>>      DMAR:[fault reason 01] Present bit in root entry is clear
>>
>> Analysis for this bug:
>>
>> The present bit is set in this function:
>>
>> static struct context_entry * device_to_context_entry(
>>                 struct intel_iommu *iommu, u8 bus, u8 devfn)
>> {
>>      ......
>>                  set_root_present(root);
>>      ......
>> }
>>
>> Calling tree:
>> ixgbe_open
>>      ixgbe_setup_tx_resources
>>          intel_alloc_coherent
>>              __intel_map_single
>>                  domain_context_mapping
>>                      domain_context_mapping_one
>>                          device_to_context_entry
>>
>> This means, the present bit in root entry will not be set until the device
>>   driver is loaded.
>>
>> But in the kdump kernel, some hardware device does not know the OS is the
>>   second kernel and the drivers should be loaded again, this causes there
>> are
>>   some unexpected DMA requsts on this device when it has not been
>> initialized,
>>   and then the DMA Remapping errors come.
>>
>> To fix this DMAR fault, we need to reset the bus that this device on.
>> Reset
>>   the device itself does not work.

This seems like something that could happen with *any* device, not
just the 82599 NIC.  Or is there something in the "kernel crash ->
kexec -> kdump kernel" path that stops DMA for most devices, but not
for the 82599?

>> There also was a discussion:
>> https://lkml.org/lkml/2013/5/14/9
>>
>> Signed-off-by: Li, Zhen-Hua <zhen-hual@...com>
>> ---
>>   drivers/pci/quirks.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 80c2d01..5198af3 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -25,6 +25,7 @@
>>   #include <linux/sched.h>
>>   #include <linux/ktime.h>
>>   #include <asm/dma.h>  /* isa_dma_bridge_buggy */
>> +#include <linux/crash_dump.h>
>>   #include "pci.h"
>>
>>   /*
>> @@ -3832,3 +3833,13 @@ void pci_dev_specific_enable_acs(struct pci_dev
>> *dev)
>>                 }
>>         }
>>   }
>> +
>> +#ifdef CONFIG_CRASH_DUMP
>> +void quirk_reset_buggy_devices(struct pci_dev *dev)
>> +{
>> +       if (unlikely(is_kdump_kernel()))
>> +               pci_try_reset_bus(dev->bus);
>> +}
>> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_INTEL, 0x10f8,
>> +               PCI_CLASS_NETWORK_ETHERNET, 8, quirk_reset_buggy_devices);
>> +#endif
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ