netdev - RE: igb_poll - device driver failed to check map error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <804857E1F29AAC47BF68C404FC60A1844D9E2F00@ORSMSX103.amr.corp.intel.com>
Date:	Fri, 15 Mar 2013 16:03:47 +0000
From:	"Allan, Bruce W" <bruce.w.allan@...el.com>
To:	"christoph.paasch@...ouvain.be" <christoph.paasch@...ouvain.be>,
	Alexander Duyck <alexander.duyck@...il.com>
CC:	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
	Eric Dumazet <edumazet@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: igb_poll - device driver failed to check map error

> -----Original Message-----
> From: Christoph Paasch [mailto:christoph.paasch@...il.com] On Behalf Of
> Christoph Paasch
> Sent: Friday, March 15, 2013 12:52 AM
> To: Alexander Duyck
> Cc: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Duyck, Alexander
> H; Eric Dumazet; netdev@...r.kernel.org
> Subject: Re: igb_poll - device driver failed to check map error
> 
> On Thursday 14 March 2013 19:18:18 Alexander Duyck wrote:
> > On 03/12/2013 02:31 AM, Christoph Paasch wrote:
> > > Hello,
> > >
> > > I'm seeing a warning while booting my machine when DMA_API_DEBUG
> is set:
> > >
> > > [   36.402824] ------------[ cut here ]------------
> > > [   36.458070] WARNING: at
> > > /home/cpaasch/builder/net-next/lib/dma-debug.c:934
> > > check_unmap+0x648/0x702()
> > > [   36.567377] Hardware name: ProLiant DL165 G7
> > > [   36.618452] igb 0000:04:00.0: DMA-API: device driver failed to check
> > > map
> > > error[device address=0x0000000233d9b232] [size=154 bytes] [mapped
> as
> > > single] [   36.776640] Modules linked in:
> > > [   36.815446] Pid: 0, comm: swapper/7 Not tainted 3.9.0-rc1-mptcp+
> #101
> > > [   36.892515] Call Trace:
> > > [   36.921745]  <IRQ>  [<ffffffff8102ad7f>]
> warn_slowpath_common+0x80/0x9a
> > > [   37.001023]  [<ffffffff8102ae2d>] warn_slowpath_fmt+0x41/0x43
> > > [   37.069771]  [<ffffffff811db17f>] check_unmap+0x648/0x702
> > > [   37.134363]  [<ffffffff811db3e9>]
> debug_dma_unmap_page+0x50/0x52
> > > [   37.206234]  [<ffffffff8136676a>] igb_poll+0x144/0xf7c
> > > [   37.267706]  [<ffffffff8104dd19>] ? sched_clock_cpu+0x46/0xd1
> > > [   37.336456]  [<ffffffff814458ce>] net_rx_action+0xa7/0x1d0
> > > [   37.402085]  [<ffffffff81030b65>] __do_softirq+0xb4/0x16f
> > > [   37.466673]  [<ffffffff81030c90>] irq_exit+0x40/0x87
> > > [   37.526067]  [<ffffffff81002db1>] do_IRQ+0x98/0xaf
> > > [   37.583378]  [<ffffffff815210aa>] common_interrupt+0x6a/0x6a
> > > [   37.651086]  <EOI>  [<ffffffff8105d4be>] ?
> > > __tick_nohz_idle_enter+0x116/0x31f
> > > [   37.736595]  [<ffffffff81008a04>] ? default_idle+0x24/0x39
> > > [   37.802224]  [<ffffffff81008c62>] cpu_idle+0x68/0xa4
> > > [   37.861616]  [<ffffffff81519f78>] start_secondary+0x1a9/0x1ad
> > > [   37.930364] ---[ end trace 01b5bb0fd75a464c ]---
> > >
> > >
> > > It happens shortly after mounting the NFS-root filesystem.
> > >
> > > I tried to understand what is going on, but I am now at my wit's end.
> > >
> > > By adding some print-statements, here is what I found out (not sure if
> > > this is anyhow helpful):
> > >
> > > The difference between tx_buffer->time_stamp and the current 'jiffies'
> is
> > > up to 2000 jiffies (HZ==1000) at the first time the above warning
> happens
> > > (this seems too much for me). From then on, I see my print 3-4 times
> > > appear but without such a big difference between the timestamps
> > > (difference around 1 and 2 jiffies).
> > >
> > > Some other stuff, I printed:
> > > tx_buffer->skb: ffff880235054c80
> > > tx_buffer->bytecount: 154
> > > tx_buffer->gso_segs: 1
> > > tx_buffer->protocol: 8
> > > tx_buffer->tx_flags 0x20
> > >
> > >
> > > One last thing:
> > > Am I right that after each call to dma_map_single/page a call to
> > > dma_mapping_error is needed? If that's the case, I have some patches
> that
> > > add this statement at missing places in the e1000, e1000e and ixgb
> > > driver. But these patches do not fix my above problem.
> > >
> > >
> > > Thanks for your help,
> > > Christoph
> >
> > Christoph,
> >
> > One thing that might be useful would be to reproduce this with a
> > standard 3.9-rc kernel instead of one using the multipath TCP patches.
> > This will help us to verify that the issue is reproducible with a stock
> > kernel and is not related to any ongoing work you may have only in your
> > tree.
> 
> Hello,
> 
> this is on a clean net-next kernel without any MPTCP-code.
> 
> I bisected it down to  787314c35fbb (Merge tag 'iommu-updates-v3.8' of
> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu), which simply
> introduces the debug_dma_mapping_error-checks.
> 
> Am I right with the missing calls to dma_mapping_error in e1000, e1000e
> and
> ixgb?
> 
> Cheers,
> Christoph
> 
> 
> 
> --
> IP Networking Lab --- http://inl.info.ucl.ac.be
> MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
> UCLouvain
> --

Hi Christoph,

You are correct re. the missing calls to dma_mapping_error and I have that on my
to-do list for e1000e, but if you have patches already feel free to send them along
(please cc the Intel wired ethernet list e1000-devel@...ts.sourceforge.net).

Thanks,
Bruce.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html