[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR0402MB3607E9BD414FF850D577F76EFF6D0@AM6PR0402MB3607.eurprd04.prod.outlook.com>
Date: Thu, 2 Jul 2020 04:18:16 +0000
From: Andy Duan <fugang.duan@....com>
To: Kegl Rohit <keglrohit@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] net: ethernet: freescale: fec: copybreak handling
throughput, dma_sync_* optimisations allowed?
From: Kegl Rohit <keglrohit@...il.com> Sent: Thursday, July 2, 2020 2:45 AM
> fec_enet_copybreak(u32 length, ...) uses
>
> dma_sync_single_for_cpu(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr), FEC_ENET_RX_FRSIZE - fep->rx_align,
> DMA_FROM_DEVICE); if (!swap)
> memcpy(new_skb->data, (*skb)->data, length);
>
> to sync the descriptor data buffer and memcpy the data to the new skb
> without calling dma_unmap().
dma_sync_* is enough, no need to call dma_unmap and dma_map_* that
will heavy load.
> Later in fec_enet_rx_queue() the dma descriptor buffer is synced again in the
> opposite direction.
>
> if (is_copybreak) {
> dma_sync_single_for_device(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr), FEC_ENET_RX_FRSIZE - fep->rx_align,
> DMA_FROM_DEVICE); }
>
dma_sync_single_for_cpu(DMA_FROM_DEVICE)
__dma_inv_area #invalidate the area
dma_sync_single_for_device(DMA_FROM_DEVICE)
__dma_inv_area #invalidate the area
__dma_clean_area #clean the area
dma_sync_single_for_cpu() and dma_sync_single_for_device() are used in pairs,
there have no problem for usage.
> Now the two main questions:
> 1. Is it necessary to call dma_sync_single_for_cpu for the whole buffer size
> (FEC_ENET_RX_FRSIZE - fep->rx_align), wouldn't syncing the real packet
> length which is accessed by memcpy be enough?
> Like so: dma_sync_single_for_cpu(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr), (u32) length, DMA_FROM_DEVICE);
In general usage, you don't know the next frame size, and cannot ensure
the buffer is dirty or not, so invalidate the whole area for next frame.
On some arm64 A53, the dcache invalidate on A53 is flush + invalidate,
and prefetch may fetch the area, that may causes dirty data flushed back
to the dma memory if the area has dirty data.
>
> 2. Is dma_sync_single_for_device even necessary? There is no data passed
> back to the device because the skb descriptor buffer is not modified and the
> fec peripheral does not need any valid data.
> The example in
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> kernel.org%2Fdoc%2FDocumentation%2FDMA-API-HOWTO.txt&data=0
> 2%7C01%7Cfugang.duan%40nxp.com%7C7fb56778153a4139214808d81deed
> a6d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637292258992
> 313637&sdata=4Nv4J7APzmNTv7Dv39tmwpJhFeZ8bNY1eaoAQnx4FdM
> %3D&reserved=0
> states:
> /* CPU should not write to
> * DMA_FROM_DEVICE-mapped area,
> * so dma_sync_single_for_device() is
> * not needed here. It would be required
> * for DMA_BIDIRECTIONAL mapping if
> * the memory was modified.
> */
That should ensure the whole area is not dirty.
> I am new to the DMA API on ARM. Are these changes regarding cache
> flushing,... allowed? These would increase the copybreak throughput by
> reducing CPU load.
To avoid FIFO overrun, it requires to ensure PHY pause frame is enabled.
Powered by blists - more mailing lists