lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Jul 2020 09:14:53 +0000
From:   Andy Duan <fugang.duan@....com>
To:     Kegl Rohit <keglrohit@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] net: ethernet: freescale: fec: copybreak handling
 throughput, dma_sync_* optimisations allowed?

From: Kegl Rohit <keglrohit@...il.com> Sent: Thursday, July 2, 2020 3:39 PM
> On Thu, Jul 2, 2020 at 6:18 AM Andy Duan <fugang.duan@....com> wrote:
> > That should ensure the whole area is not dirty.
> 
> dma_sync_single_for_cpu() and dma_sync_single_for_device() can or must be
> used in pairs?
> So in this case it is really necessary to sync back the skb data buffer via
> dma_sync_single_for_device? Even when the CPU does not change any bytes
> in the skb data buffer / readonly like in this case.

No, if the buffer is not modified, dma_sync_single_for_device() is not necessary.

For some arm64 core, the dcache invalidate on A53 is flush + invalidate, once the buffer
is modified, it will cause read back wrong data without dma_sync_single_for_device(). 
And the driver also support Coldfire platforms, I am not family with the arch.

From current analyze for arm/arm64,  I also think dma_sync_single_for_device() is not
necessary due to the buffer is not modified.

Anyway, it still need to get other experts comment, and it need to do many test and stress test.

> And there is no DMA_BIDIRECTIONAL mapping.
> 
> I thought copybreak it is not about the next frame size. It is about the current
> frame. And the actual length is known via the size field in the finished DMA
> descriptor.
> Or do you mean that the next received frame could be no copybreak frame.
> 1. Rx copybreakable frame with sizeX < copybreak 2. copybreak
> dma_sync_single_for_cpu(dmabuffer, sizeX) 3. copybreak alloc new_skb,
> memcpy(new_skb, dmabuffer, sizeX) 4. copybreak
> dma_sync_single_for_device(dmabuffer, sizeX) 5. Rx non copybreakable
> frame with sizeY >= copybreak 4. dma_unmap_single(dmabuffer,
> FEC_ENET_RX_FRSIZE - fep->rx_align) is called and can cause data corruption
> because not all bytes were marked dirty even if nobody DMA & CPU touched
> them?

No CPU touch, it should be clean.
> 
> > > I am new to the DMA API on ARM. Are these changes regarding cache
> > > flushing,... allowed? These would increase the copybreak throughput
> > > by reducing CPU load.
> >
> > To avoid FIFO overrun, it requires to ensure PHY pause frame is enabled.
> 
> As the errata states this is also not always true, because the first xoff could
> arrive too late. Pause frames/flow control is not really common and could
> cause troubles with other random network components acting different or not
> supporting pause frames correctly. For example the driver itself does enable
> pause frames for Gigabit by default. But we have no Gigabit Phy so no
> FEC_QUIRK_HAS_GBIT and therefore pause frames are not supported by the
> driver as of now.
> 
> 
> It looks like copybreak is implemented similar to e1000_main.c
> e1000_copybreak().
> There is only the real/needed packet length (length =
> le16_to_cpu(rx_desc->length)) is synced via dma_sync_single_for_cpu and no
> dma_sync_single_for_device.
> 
> Here is a diff with the previous changes assuming that
> dma_sync_single_for_device must be used to avoid any cache flush backs
> even when no data was changed.

Below change seems fine, can you collect some data before you send out
the patch for review.
- run iperf stress test to ensure the stability
- collect the performance improvement data 

Thanks.
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 2d0d313ee..464783c15 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1387,9 +1387,9 @@ static bool fec_enet_copybreak(struct net_device
> *ndev, struct sk_buff **skb,
>                 return false;
> 
>         dma_sync_single_for_cpu(&fep->pdev->dev,
>                                 fec32_to_cpu(bdp->cbd_bufaddr),
> -                               FEC_ENET_RX_FRSIZE - fep->rx_align,
> +                               length,
>                                 DMA_FROM_DEVICE);
>         if (!swap)
>                 memcpy(new_skb->data, (*skb)->data, length);
>         else
> @@ -1413,8 +1413,9 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
>         unsigned short status;
>         struct  sk_buff *skb_new = NULL;
>         struct  sk_buff *skb;
>         ushort  pkt_len;
> +       ushort  pkt_len_nofcs;
>         __u8 *data;
>         int     pkt_received = 0;
>         struct  bufdesc_ex *ebdp = NULL;
>         bool    vlan_packet_rcvd = false;
> @@ -1479,9 +1480,10 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
>                 /* The packet length includes FCS, but we don't want to
>                  * include that when passing upstream as it messes up
>                  * bridging applications.
>                  */
> -               is_copybreak = fec_enet_copybreak(ndev, &skb, bdp,
> pkt_len - 4,
> +               pkt_len_nofcs = pkt_len - 4;
> +               is_copybreak = fec_enet_copybreak(ndev, &skb, bdp,
> pkt_len_nofcs,
>                                                   need_swap);
>                 if (!is_copybreak) {
>                         skb_new = netdev_alloc_skb(ndev,
> FEC_ENET_RX_FRSIZE);
>                         if (unlikely(!skb_new)) { @@ -1554,9 +1556,9
> @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
> 
>                 if (is_copybreak) {
>                         dma_sync_single_for_device(&fep->pdev->dev,
> 
> fec32_to_cpu(bdp->cbd_bufaddr),
> -
> FEC_ENET_RX_FRSIZE
> - fep->rx_align,
> +
> pkt_len_nofcs,
> 
> DMA_FROM_DEVICE);
>                 } else {
>                         rxq->rx_skbuff[index] = skb_new;
>                         fec_enet_new_rxbdp(ndev, bdp, skb_new);

Powered by blists - more mailing lists