lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Jul 2020 04:18:16 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Kegl Rohit <keglrohit@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] net: ethernet: freescale: fec: copybreak handling
 throughput, dma_sync_* optimisations allowed?

From: Kegl Rohit <keglrohit@...il.com> Sent: Thursday, July 2, 2020 2:45 AM
> fec_enet_copybreak(u32 length, ...) uses
> 
> dma_sync_single_for_cpu(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr), FEC_ENET_RX_FRSIZE - fep->rx_align,
> DMA_FROM_DEVICE); if (!swap)
>    memcpy(new_skb->data, (*skb)->data, length);
> 
> to sync the descriptor data buffer and memcpy the data to the new skb
> without calling dma_unmap().
dma_sync_* is enough, no need to call dma_unmap and dma_map_* that
will heavy load.

> Later in fec_enet_rx_queue() the dma descriptor buffer is synced again in the
> opposite direction.
> 
> if (is_copybreak) {
>   dma_sync_single_for_device(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr),  FEC_ENET_RX_FRSIZE - fep->rx_align,
> DMA_FROM_DEVICE); }
> 
dma_sync_single_for_cpu(DMA_FROM_DEVICE)
	__dma_inv_area  #invalidate the area

dma_sync_single_for_device(DMA_FROM_DEVICE)
	__dma_inv_area  #invalidate the area
	__dma_clean_area #clean the area

dma_sync_single_for_cpu() and dma_sync_single_for_device() are used in pairs,
there have no problem for usage.

> Now the two main questions:
> 1. Is it necessary to call dma_sync_single_for_cpu for the whole buffer size
> (FEC_ENET_RX_FRSIZE - fep->rx_align), wouldn't syncing the real packet
> length which is accessed by memcpy be enough?
> Like so: dma_sync_single_for_cpu(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr), (u32) length, DMA_FROM_DEVICE);

In general usage, you don't know the next frame size, and cannot ensure
the buffer is dirty or not, so invalidate the whole area for next frame.

On some arm64 A53, the dcache invalidate on A53 is flush + invalidate, 
and prefetch may fetch the area, that may causes dirty data flushed back
to the dma memory if the area has dirty data.

> 
> 2. Is dma_sync_single_for_device even necessary? There is no data passed
> back to the device because the skb descriptor buffer is not modified and the
> fec peripheral does not need any valid data.
> The example in
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> kernel.org%2Fdoc%2FDocumentation%2FDMA-API-HOWTO.txt&amp;data=0
> 2%7C01%7Cfugang.duan%40nxp.com%7C7fb56778153a4139214808d81deed
> a6d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C1%7C637292258992
> 313637&amp;sdata=4Nv4J7APzmNTv7Dv39tmwpJhFeZ8bNY1eaoAQnx4FdM
> %3D&amp;reserved=0
> states:
>  /* CPU should not write to
>   * DMA_FROM_DEVICE-mapped area,
>   * so dma_sync_single_for_device() is
>   * not needed here. It would be required
>   * for DMA_BIDIRECTIONAL mapping if
>   * the memory was modified.
>  */
That should ensure the whole area is not dirty.

> I am new to the DMA API on ARM. Are these changes regarding cache
> flushing,... allowed? These would increase the copybreak throughput by
> reducing CPU load.

To avoid FIFO overrun, it requires to ensure PHY pause frame is enabled.

Powered by blists - more mailing lists