lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130712165300.GJ5854@hmsreliant.think-freely.org>
Date:	Fri, 12 Jul 2013 12:53:00 -0400
From:	Neil Horman <nhorman@...driver.com>
To:	Luis Henriques <luis.henriques@...onical.com>
Cc:	netdev@...r.kernel.org, Jay Cliburn <jcliburn@...il.com>,
	Chris Snook <chris.snook@...il.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: [net PATCH] atl1e: fix dma mapping warnings

On Fri, Jul 12, 2013 at 05:38:10PM +0100, Luis Henriques wrote:
> Neil Horman <nhorman@...driver.com> writes:
> 
> > On Fri, Jul 12, 2013 at 05:04:26PM +0100, Luis Henriques wrote:
> >> Neil Horman <nhorman@...driver.com> writes:
> >> 
> >> > Recently had this backtrace reported:
> >> > WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930()
> >> > Hardware name: System Product Name
> >> > ATL1E 0000:02:00.0: DMA-API: device driver failed to check map error[device
> >> > address=0x00000000cbfd1000] [size=90 bytes] [mapped as single]
> >> > Modules linked in: xt_conntrack nf_conntrack ebtable_filter ebtables
> >> > ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt
> >> > iTCO_vendor_support snd_hda_intel acpi_cpufreq mperf coretemp btrfs zlib_deflate
> >> > snd_hda_codec snd_hwdep microcode raid6_pq libcrc32c snd_seq usblp serio_raw xor
> >> > snd_seq_device joydev snd_pcm snd_page_alloc snd_timer snd lpc_ich i2c_i801
> >> > soundcore mfd_core atl1e asus_atk0110 ata_generic pata_acpi radeon i2c_algo_bit
> >> > drm_kms_helper ttm drm i2c_core pata_marvell uinput
> >> > Pid: 314, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.3.fc19.x86_64 #1
> >> > Call Trace:
> >> >  <IRQ>  [<ffffffff81069106>] warn_slowpath_common+0x66/0x80
> >> >  [<ffffffff8106916c>] warn_slowpath_fmt+0x4c/0x50
> >> >  [<ffffffff8138151d>] check_unmap+0x47d/0x930
> >> >  [<ffffffff810ad048>] ? sched_clock_cpu+0xa8/0x100
> >> >  [<ffffffff81381a2f>] debug_dma_unmap_page+0x5f/0x70
> >> >  [<ffffffff8137ce30>] ? unmap_single+0x20/0x30
> >> >  [<ffffffffa01569a1>] atl1e_intr+0x3a1/0x5b0 [atl1e]
> >> >  [<ffffffff810d53fd>] ? trace_hardirqs_off+0xd/0x10
> >> >  [<ffffffff81119636>] handle_irq_event_percpu+0x56/0x390
> >> >  [<ffffffff811199ad>] handle_irq_event+0x3d/0x60
> >> >  [<ffffffff8111cb6a>] handle_fasteoi_irq+0x5a/0x100
> >> >  [<ffffffff8101c36f>] handle_irq+0xbf/0x150
> >> >  [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
> >> >  [<ffffffff81073b10>] ? irq_enter+0x50/0xa0
> >> >  [<ffffffff8172738d>] do_IRQ+0x4d/0xc0
> >> >  [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
> >> >  [<ffffffff8171c6b2>] common_interrupt+0x72/0x72
> >> >  <EOI>  [<ffffffff810db5b2>] ? lock_release+0xc2/0x310
> >> >  [<ffffffff8109ea04>] lg_local_unlock_cpu+0x24/0x50
> >> >  [<ffffffff811dcb2f>] file_sb_list_del+0x3f/0x50
> >> >  [<ffffffff811dcb6d>] fput+0x2d/0xc0
> >> >  [<ffffffff811d8ea1>] filp_close+0x61/0x90
> >> >  [<ffffffff811fae4d>] __close_fd+0x8d/0x150
> >> >  [<ffffffff811d8ef0>] sys_close+0x20/0x50
> >> >  [<ffffffff81725699>] system_call_fastpath+0x16/0x1b
> >> >
> >> > The usual straighforward failure to check for dma_mapping_error after a map
> >> > operation is completed.
> >> >
> >> > This patch should fix it, the reporter wandered off after filing this bz:
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=954170
> >> >
> >> > and I don't have hardware to test, but the fix is pretty straightforward, so I
> >> > figured I'd post it for review.
> >> >
> >> > Signed-off-by: Neil Horman <nhorman@...driver.com>
> >> > CC: Jay Cliburn <jcliburn@...il.com>
> >> > CC: Chris Snook <chris.snook@...il.com>
> >> > CC: "David S. Miller" <davem@...emloft.net>
> >> > ---
> >> >  drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 26 +++++++++++++++++++++++--
> >> >  1 file changed, 24 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
> >> > index 895f537..53a725b 100644
> >> > --- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
> >> > +++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
> >> > @@ -1665,7 +1665,7 @@ check_sum:
> >> >  	return 0;
> >> >  }
> >> >  
> >> > -static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> > +static int atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  		      struct sk_buff *skb, struct atl1e_tpd_desc *tpd)
> >> >  {
> >> >  	struct atl1e_tpd_desc *use_tpd = NULL;
> >> > @@ -1677,6 +1677,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  	u16 nr_frags;
> >> >  	u16 f;
> >> >  	int segment;
> >> > +	int ring_start = adapter->tx_ring.next_to_use;
> >> >  
> >> >  	nr_frags = skb_shinfo(skb)->nr_frags;
> >> >  	segment = (tpd->word3 >> TPD_SEGMENT_EN_SHIFT) & TPD_SEGMENT_EN_MASK;
> >> > @@ -1689,6 +1690,9 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  		tx_buffer->length = map_len;
> >> >  		tx_buffer->dma = pci_map_single(adapter->pdev,
> >> >  					skb->data, hdr_len, PCI_DMA_TODEVICE);
> >> > +		if (dma_mapping_error(&adapter->pdev->dev, tx_buffer->dma))
> >> > +			return -ENOSPC;
> >> > +
> >> >  		ATL1E_SET_PCIMAP_TYPE(tx_buffer, ATL1E_TX_PCIMAP_SINGLE);
> >> >  		mapped_len += map_len;
> >> >  		use_tpd->buffer_addr = cpu_to_le64(tx_buffer->dma);
> >> > @@ -1715,6 +1719,13 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  		tx_buffer->dma =
> >> >  			pci_map_single(adapter->pdev, skb->data + mapped_len,
> >> >  					map_len, PCI_DMA_TODEVICE);
> >> > +
> >> > +		if (dma_mapping_error(&adapter->pdev->dev, tx_buffer->dma)) {
> >> > +			/* Reset the tx rings next pointer */
> >> > +			adapter->tx_ring.next_to_use = ring_start;
> >> > +			return -ENOSPC;
> >> > +		}
> >> > +
> >> >  		ATL1E_SET_PCIMAP_TYPE(tx_buffer, ATL1E_TX_PCIMAP_SINGLE);
> >> >  		mapped_len  += map_len;
> >> >  		use_tpd->buffer_addr = cpu_to_le64(tx_buffer->dma);
> >> > @@ -1750,6 +1761,13 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  							  (i * MAX_TX_BUF_LEN),
> >> >  							  tx_buffer->length,
> >> >  							  DMA_TO_DEVICE);
> >> > +
> >> > +			if (dma_mapping_error(&adapter->pdev->dev, tx_buffer->dma)) {
> >> > +				/* Reset the ring next to use pointer */
> >> > +				adapter->tx_ring.next_to_use = ring_start;
> >> > +				return -ENOSPC;
> >> > +			}
> >> > +
> >> >  			ATL1E_SET_PCIMAP_TYPE(tx_buffer, ATL1E_TX_PCIMAP_PAGE);
> >> >  			use_tpd->buffer_addr = cpu_to_le64(tx_buffer->dma);
> >> >  			use_tpd->word2 = (use_tpd->word2 & (~TPD_BUFLEN_MASK)) |
> >> > @@ -1767,6 +1785,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
> >> >  	/* The last buffer info contain the skb address,
> >> >  	   so it will be free after unmap */
> >> >  	tx_buffer->skb = skb;
> >> > +	return 0;
> >> 
> >> Looks good to me, except from return statement in a void function.
> >> 
> >> Cheers,
> >> -- 
> >> Luis
> >> 
> > Actually scratch that last note, I don't see any return statements in fuctions
> > that return void.  I modified atl1e_tx_map to return an int so we could do the
> > right thing in atl1e_xmit_frame.  Is there something I'm missing?
> > Neil
> 
> Ups, you're right.  Sorry for the noise.
> 
no worries, appreciate the extra eyes.

> So, just as a suggestion, maybe your patch should also change
> atl1e_xmit_frame to do something with the new return code (maybe
> return NETDEV_TX_BUSY...)
> 
I had thought about that, and I wasn't sure I should do that, without also
stopping the tx queue, which I didn't want to do.  My reasoning was that I
couldn't be sure if the dma error was going to be a one time, transient issue,
or a chronic, recurring issue.  If its transient, then it would be fine to just
return TX_BUSY, let the scheduler requeue the skb and go on about our business,
but if the dma error is longer lived, you'll wind up backing up the queue
needlessly.

I could alternatively stop the queue and return TX_BUSY, but if the problem is
transient (I think in most cases it will be, but I'm not sure), then I've
stopped the queue without any reason, and have to find some way/indicator to
start it again.

Looking at other drivers, I see most stop the queue whenever they return busy,
but do so in code paths that guarantee there are frames pending in their tx
hardware (implying that they will get a tx completion interrupt, which they can
use to re-awaken the queue).

We don't have that guarantee when the dma mapping code errors out on us, not
without doing alot of additional checking.  Seem easier to just call the frame
lost.  But I'll follow consensus on this, would you prefer I stop the queue,
return BUSY, and find a way to wake the queue up again?

Neil
 
> Cheers,
> -- 
> Luis
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ