[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190215111007.myjudx35mmwyp7km@shell.armlinux.org.uk>
Date: Fri, 15 Feb 2019 11:10:07 +0000
From: Russell King - ARM Linux admin <linux@...linux.org.uk>
To: Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
netdev@...r.kernel.org
Subject: Re: [REGRESSION 4.20] mvneta - DMA-API: device driver tries to sync
DMA memory it has not allocated
On Tue, Feb 12, 2019 at 01:59:19PM +0000, Russell King - ARM Linux admin wrote:
> Hi,
>
> Booting 4.20 on SolidRun Clearfog reliably provokes the following
> warning - this is with mvneta built in, but DSA as modules:
>
> WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
> mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
> Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
> CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
> [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
> [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
> [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
> [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
> [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
> [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
> [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
> [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
> [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
> ...
>
> This appears to be from:
>
> if (rx_bytes <= rx_copybreak) {
> /* better copy a small frame and not unmap the DMA region */
> skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
> if (unlikely(!skb))
> goto err_drop_frame_ret_pool;
>
> dma_sync_single_range_for_cpu(dev->dev.parent,
> rx_desc->buf_phys_addr,
> MVNETA_MH_SIZE + NET_SKB_PAD,
> rx_bytes,
> DMA_FROM_DEVICE);
>
> which suggests that rx_desc->buf_phys_addr is not something that should
> be passed to dma_sync_single_range_for_cpu(). I've not been able to
> track down why that is, nor which interface is provoking that.
>
> As I don't have the details of how the buffer management hardware works
> on Armada 388, I'm unable to debug this myself.
Doing what debugging I _can_ do, it seems that this has been a long-term
error in mvneta, but one that was merely uncovered by:
commit 562e2f467e71f45f0400ebee5077eaa426d3e426
Author: Yelena Krivosheev <yelena@...vell.com>
Date: Wed Jul 18 18:10:57 2018 +0200
The buffer that is being complained about is sync'd using a device of
dev->dev.parent 'f1070000.ethernet', but is allocated by
mvneta_bm_construct() against a different device:
mvneta_bm_construct: 0x2dd85c00 +0x140 for ee113294 (f10c8000.bm)
namely 'f10c8000.bm'.
It's long-term, because it will only trigger in older kernels if we
hit the copy-break stuff, which used to do:
if (rx_bytes <= rx_copybreak) {
skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
if (unlikely(!skb)) {
...
}
dma_sync_single_range_for_cpu(dev->dev.parent,
phys_addr,
MVNETA_MH_SIZE + NET_SKB_PAD,
rx_bytes,
DMA_FROM_DEVICE);
where rx_copybreak is 256 bytes. Quite why that hasn't been seen
already, I do not know.
Looking at the code after the commit, if mvneta is used on a
non-coherent platform, then we have problems:
copy_size = min(skb_size, rx_bytes);
...
memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
copy_size);
...
if (rxq->left_size == 0) {
int size = copy_size + MVNETA_MH_SIZE;
dma_sync_single_range_for_cpu(dev->dev.parent,
phys_addr, 0,
size,
DMA_FROM_DEVICE);
Since the sync is done _after_ we've copied data from a non-coherent
buffer.
If this code has been written to assume that we're always coherent,
then is there any point at all to having the incorrect dma_sync_*()
calls at all?
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Powered by blists - more mailing lists