lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3729150.HPUjKjXiGc@cpaasch-mac>
Date:	Sat, 16 Mar 2013 12:07:39 +0100
From:	Christoph Paasch <christoph.paasch@...ouvain.be>
To:	Alexander Duyck <alexander.h.duyck@...el.com>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	Bruce Allan <bruce.w.allan@...el.com>,
	Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org
Subject: Re: igb_poll - device driver failed to check map error

On Friday 15 March 2013 16:08:31 Alexander Duyck wrote:
> On 03/15/2013 12:52 AM, Christoph Paasch wrote:
> > On Thursday 14 March 2013 19:18:18 Alexander Duyck wrote:
> >> On 03/12/2013 02:31 AM, Christoph Paasch wrote:
> >>> Hello,
> >>> 
> >>> I'm seeing a warning while booting my machine when DMA_API_DEBUG is set:
> >>> 
> >>> [   36.402824] ------------[ cut here ]------------
> >>> [   36.458070] WARNING: at
> >>> /home/cpaasch/builder/net-next/lib/dma-debug.c:934
> >>> check_unmap+0x648/0x702()
> >>> [   36.567377] Hardware name: ProLiant DL165 G7
> >>> [   36.618452] igb 0000:04:00.0: DMA-API: device driver failed to check
> >>> map
> >>> error[device address=0x0000000233d9b232] [size=154 bytes] [mapped as
> >>> single] [   36.776640] Modules linked in:
> >>> [   36.815446] Pid: 0, comm: swapper/7 Not tainted 3.9.0-rc1-mptcp+ #101
> >>> [   36.892515] Call Trace:
> >>> [   36.921745]  <IRQ>  [<ffffffff8102ad7f>]
> >>> warn_slowpath_common+0x80/0x9a
> >>> [   37.001023]  [<ffffffff8102ae2d>] warn_slowpath_fmt+0x41/0x43
> >>> [   37.069771]  [<ffffffff811db17f>] check_unmap+0x648/0x702
> >>> [   37.134363]  [<ffffffff811db3e9>] debug_dma_unmap_page+0x50/0x52
> >>> [   37.206234]  [<ffffffff8136676a>] igb_poll+0x144/0xf7c
> >>> [   37.267706]  [<ffffffff8104dd19>] ? sched_clock_cpu+0x46/0xd1
> >>> [   37.336456]  [<ffffffff814458ce>] net_rx_action+0xa7/0x1d0
> >>> [   37.402085]  [<ffffffff81030b65>] __do_softirq+0xb4/0x16f
> >>> [   37.466673]  [<ffffffff81030c90>] irq_exit+0x40/0x87
> >>> [   37.526067]  [<ffffffff81002db1>] do_IRQ+0x98/0xaf
> >>> [   37.583378]  [<ffffffff815210aa>] common_interrupt+0x6a/0x6a
> >>> [   37.651086]  <EOI>  [<ffffffff8105d4be>] ?
> >>> __tick_nohz_idle_enter+0x116/0x31f
> >>> [   37.736595]  [<ffffffff81008a04>] ? default_idle+0x24/0x39
> >>> [   37.802224]  [<ffffffff81008c62>] cpu_idle+0x68/0xa4
> >>> [   37.861616]  [<ffffffff81519f78>] start_secondary+0x1a9/0x1ad
> >>> [   37.930364] ---[ end trace 01b5bb0fd75a464c ]---
> >>> 
> >>> 
> >>> It happens shortly after mounting the NFS-root filesystem.
> >>> 
> >>> I tried to understand what is going on, but I am now at my wit's end.
> >>> 
> >>> By adding some print-statements, here is what I found out (not sure if
> >>> this is anyhow helpful):
> >>> 
> >>> The difference between tx_buffer->time_stamp and the current 'jiffies'
> >>> is
> >>> up to 2000 jiffies (HZ==1000) at the first time the above warning
> >>> happens
> >>> (this seems too much for me). From then on, I see my print 3-4 times
> >>> appear but without such a big difference between the timestamps
> >>> (difference around 1 and 2 jiffies).
> >>> 
> >>> Some other stuff, I printed:
> >>> tx_buffer->skb: ffff880235054c80
> >>> tx_buffer->bytecount: 154
> >>> tx_buffer->gso_segs: 1
> >>> tx_buffer->protocol: 8
> >>> tx_buffer->tx_flags 0x20
> >>> 
> >>> 
> >>> One last thing:
> >>> Am I right that after each call to dma_map_single/page a call to
> >>> dma_mapping_error is needed? If that's the case, I have some patches
> >>> that
> >>> add this statement at missing places in the e1000, e1000e and ixgb
> >>> driver. But these patches do not fix my above problem.
> >>> 
> >>> 
> >>> Thanks for your help,
> >>> Christoph
> >> 
> >> Christoph,
> >> 
> >> One thing that might be useful would be to reproduce this with a
> >> standard 3.9-rc kernel instead of one using the multipath TCP patches.
> >> This will help us to verify that the issue is reproducible with a stock
> >> kernel and is not related to any ongoing work you may have only in your
> >> tree.
> > 
> > Hello,
> > 
> > this is on a clean net-next kernel without any MPTCP-code.
> > 
> > I bisected it down to  787314c35fbb (Merge tag 'iommu-updates-v3.8' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu), which simply
> > introduces the debug_dma_mapping_error-checks.
> > 
> > Am I right with the missing calls to dma_mapping_error in e1000, e1000e
> > and
> > ixgb?
> > 
> > Cheers,
> > Christoph
> 
> Christoph,
> 
> The cause of this issues you are seeing may be due to the fact that the
> buffer triggering the error is being reused.  I was able to reproduce
> this issue occasionally with pktgen if I cloned the skb.  What may be
> happening is that the buffer is being mapped in the transmit path on one
> CPU while on another CPU the buffer is being cleaned.  Since the output
> of each mapping is the physical address there is nothing to make each
> mapping unique and I suspect this is resulting in false hits.
> 
> You should be able to verify this if you were to check the skb->users
> count as well as the dataref value in the skb_shared_info.  I suspect
> either the users count of the dataref will be greater than 1.

Both, users and dataref, are equal to 1. Before the call to dev_kfree_skb_any 
and after dma_unmap_single fails.

> You might also try testing the patch below to see if it has any effect.
>  All it does is reorder the free and the unmap so that the buffer is not
> freed for reuse until after we have checked it in the unmap path.

I tested your patch, and it fixes my issue. Feel free to add a "Tested-by" to 
the official patch.


Cheers,
Christoph

> ---
>  drivers/net/ethernet/intel/igb/igb_main.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index 4dbd629..0f9c324 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -5959,15 +5959,15 @@ static bool igb_clean_tx_irq(struct igb_q_vector
> *q_vector)
>  		total_bytes += tx_buffer->bytecount;
>  		total_packets += tx_buffer->gso_segs;
> 
> -		/* free the skb */
> -		dev_kfree_skb_any(tx_buffer->skb);
> -
>  		/* unmap skb header data */
>  		dma_unmap_single(tx_ring->dev,
>  				 dma_unmap_addr(tx_buffer, dma),
>  				 dma_unmap_len(tx_buffer, len),
>  				 DMA_TO_DEVICE);
> 
> +		/* free the skb */
> +		dev_kfree_skb_any(tx_buffer->skb);
> +
>  		/* clear tx_buffer data */
>  		tx_buffer->skb = NULL;
>  		dma_unmap_len_set(tx_buffer, len, 0);
-- 
IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://multipath-tcp.org
UCLouvain
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ