[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 06 Jul 2010 17:51:41 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Chris Li <lkml@...isli.org>
Cc: David Woodhouse <dwmw2@...radead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Matthew Wilcox <willy@...ux.intel.com>
Subject: Re: BUG in drivers/dma/ioat/dma_v2.c:314
[ adding Matthew as one of last people to touch mm/dmapool.c ]
On Tue, 2010-07-06 at 16:40 -0700, Chris Li wrote:
> On Mon, Jul 5, 2010 at 3:16 AM, David Woodhouse <dwmw2@...radead.org> wrote:
> > On Fri, 2010-07-02 at 20:00 +0100, Chris Li wrote:
> >> But I don't see the line that print out BIOS is lying.
> >
> > Hrm. Want to augment the dmar_find_matched_drhd_unit() function to
> > _always_ print the DRHD returned for the offending PCI device? And if
> > that still doesn't show, make it print pdev->vendor, pdev->device and
> > the returned DRHD pointer for _every_ call?
>
> I just did some experiment, my PCI device ID is PCI_DEVICE_ID_INTEL_ESB2_0
> (0x2670) instead of PCI_DEVICE_ID_INTEL_IOAT_SNB.
No, it should be PCI_DEVICE_ID_INTEL_IOAT_SNB (0x402f) for the dma
engine at 00:0f.0 . PCI_DEVICE_ID_INTEL_ESB2_0 is the LPC controller at
00:1f.0,
> That seems to be the reason preventing the warning to be print out. I am not
> sure the warning should be always print out. Just curious why it did
> not trigger.
It should always trigger, and I have verified as much with the attached
replacement patch (by forcing the error on a working system), but we run
into a new problem. dma_pool_alloc() assumes that any dma_mapping error
is transient. Do we need a new type of dma_mapping_error() that
indicates permanent failure versus ENOMEM? The driver can handle the
allocation failure, but it never gets the chance.
------------[ cut here ]------------
WARNING: at drivers/pci/dmar.c:574 dmar_find_matched_drhd_unit+0xe4/0xfa()
Hardware name: [redacted to protect the innocent]
BIOS wrongly assigned I/OAT IOMMU 5: reg_base_addr fe71a000 cap 4900800c2f0462 ecap e01
Modules linked in: ioatdma(+) dca ipv6 snd_pcsp snd_pcm snd_timer snd soundcore i2c_i801 snd_page_alloc serio_raw i2c_core joydev
Pid: 1166, comm: modprobe Not tainted 2.6.35-rc3+ #2
Call Trace:
[<ffffffff8104bfd0>] warn_slowpath_common+0x85/0x9d
[<ffffffff8104c043>] warn_slowpath_fmt_taint+0x3f/0x41
[<ffffffff8125dd4b>] dmar_find_matched_drhd_unit+0xe4/0xfa
[<ffffffff8126179d>] get_domain_for_dev.clone.3+0x111/0x471
[<ffffffff81261cbb>] get_valid_domain_for_dev+0x26/0x9a
[<ffffffff81261f51>] __intel_map_single+0x4c/0x175
[<ffffffff81262184>] intel_alloc_coherent+0xc7/0xef
[<ffffffff810edcd2>] dma_pool_alloc+0x179/0x2ab
[<ffffffffa00ed606>] ? kzalloc+0x14/0x16 [ioatdma]
[<ffffffffa00efe58>] ioat2_alloc_chan_resources+0x4f/0x219 [ioatdma]
[<ffffffffa00f33b9>] ioat_dma_self_test+0x94/0x2af [ioatdma]
[<ffffffff8109bff2>] ? devm_request_threaded_irq+0x98/0xaa
[<ffffffffa00f31cd>] ioat_probe+0x338/0x3aa [ioatdma]
[<ffffffffa00f3657>] ioat2_dma_probe+0x83/0x106 [ioatdma]
[<ffffffffa00f2ded>] ioat_pci_probe+0x133/0x195 [ioatdma]
[<ffffffff8124b539>] local_pci_probe+0x17/0x1b
[<ffffffff8124c2f5>] pci_device_probe+0xcd/0xfd
[<ffffffff812ee5f5>] ? driver_sysfs_add+0x4c/0x71
[<ffffffff812ee81a>] driver_probe_device+0x12f/0x240
[<ffffffff812ee97a>] __driver_attach+0x4f/0x6b
[<ffffffff812ee92b>] ? __driver_attach+0x0/0x6b
[<ffffffff812edc66>] bus_for_each_dev+0x53/0x88
[<ffffffff812ee554>] driver_attach+0x1e/0x20
[<ffffffff812ee19a>] bus_add_driver+0xd5/0x23b
[<ffffffff812eec54>] driver_register+0x9d/0x10e
[<ffffffff8124c521>] __pci_register_driver+0x58/0xc8
[<ffffffffa00fc000>] ? ioat_init_module+0x0/0x85 [ioatdma]
[<ffffffffa00fc000>] ? ioat_init_module+0x0/0x85 [ioatdma]
[<ffffffffa00fc06d>] ioat_init_module+0x6d/0x85 [ioatdma]
[<ffffffff81002069>] do_one_initcall+0x5e/0x159
[<ffffffff8107bd01>] sys_init_module+0xa1/0x1e0
[<ffffffff81009c32>] system_call_fastpath+0x16/0x1b
---[ end trace 02c1ac1f56dc9544 ]---
Disabling lock debugging due to kernel taint
IOMMU: can't find DMAR for device 0000:00:0f.0
Allocating domain for 0000:00:0f.0 failed
IOMMU: can't find DMAR for device 0000:00:0f.0
Allocating domain for 0000:00:0f.0 failed
[...ad infinitum...]
--
Dan
View attachment "ioat-catch-broken-vtd-v2.patch" of type "text/x-patch" (1671 bytes)
Powered by blists - more mailing lists