[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5434975C.9000709@hp.com>
Date: Wed, 08 Oct 2014 09:46:04 +0800
From: "Li, ZhenHua" <zhen-hual@...com>
To: Alexander Duyck <alexander.duyck@...il.com>,
Bjorn Helgaas <bhelgaas@...gle.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
Joerg Roedel <joro@...tes.org>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Bruce Allan <bruce.w.allan@...el.com>,
Carolyn Wyborny <carolyn.wyborny@...el.com>,
Don Skidmore <donald.c.skidmore@...el.com>,
Greg Rose <gregory.v.rose@...el.com>,
Alex Duyck <alexander.h.duyck@...el.com>,
John Ronciak <john.ronciak@...el.com>,
Mitch Williams <mitch.a.williams@...el.com>,
Linux NICS <linux.nics@...el.com>,
"e1000-devel@...ts.sourceforge.net"
<e1000-devel@...ts.sourceforge.net>, linda.knippers@...com
Subject: Re: [PATCH 1/1] pci/quirks: fix a dmar fault for intel 82599 card
well, then I will create a patch for ALL pcie devices.
On 10/03/2014 10:28 PM, Alexander Duyck wrote:
> On 10/02/2014 08:09 AM, Bjorn Helgaas wrote:
>> On Tue, Sep 30, 2014 at 12:15 AM, Li, ZhenHua <zhen-hual@...com> wrote:
>>> Add Joerg to CC list. For it is also related to iommu module.
>>>
>>> Joerg,
>>> There was a try for this dmar fault,
>>> https://lkml.org/lkml/2014/8/18/118
>>>
>>> This patch is trying to fix the same thing.
>>>
>>>
>>> Zhenhua
>>>
>>> On 09/30/2014 02:09 PM, Li, Zhen-Hua wrote:
>>>> On a HP system with Intel Corporation 82599 ethernet adapter, when kernel
>>>> crashed and the kdump kernel boots with intel_iommu=on, there may be some
>>>> unexpected DMA requests on this adapter, which will cause DMA Remapping
>>>> faults like:
>>>> dmar: DRHD: handling fault status reg 102
>>>> dmar: DMAR:[DMA Read] Request device [41:00.0] fault addr fff81000
>>>> DMAR:[fault reason 01] Present bit in root entry is clear
>>>>
>>>> Analysis for this bug:
>>>>
>>>> The present bit is set in this function:
>>>>
>>>> static struct context_entry * device_to_context_entry(
>>>> struct intel_iommu *iommu, u8 bus, u8 devfn)
>>>> {
>>>> ......
>>>> set_root_present(root);
>>>> ......
>>>> }
>>>>
>>>> Calling tree:
>>>> ixgbe_open
>>>> ixgbe_setup_tx_resources
>>>> intel_alloc_coherent
>>>> __intel_map_single
>>>> domain_context_mapping
>>>> domain_context_mapping_one
>>>> device_to_context_entry
>>>>
>>>> This means, the present bit in root entry will not be set until the device
>>>> driver is loaded.
>>>>
>>>> But in the kdump kernel, some hardware device does not know the OS is the
>>>> second kernel and the drivers should be loaded again, this causes there
>>>> are
>>>> some unexpected DMA requsts on this device when it has not been
>>>> initialized,
>>>> and then the DMA Remapping errors come.
>>>>
>>>> To fix this DMAR fault, we need to reset the bus that this device on.
>>>> Reset
>>>> the device itself does not work.
>> This seems like something that could happen with *any* device, not
>> just the 82599 NIC. Or is there something in the "kernel crash ->
>> kexec -> kdump kernel" path that stops DMA for most devices, but not
>> for the 82599?lex
>>
>
> This is an *any* device problem. Specifically any device that is doing
> active DMA when a kdump kernel is triggered will cause this issue since
> the IOMMU will not have valid mappings for the DMA events until the
> device driver itself is loaded and resets the device.
>
> Thanks,
>
> Alex
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists