[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR0402MB36037EF37780AFFB14C0C5CEFF6D0@AM0PR0402MB3603.eurprd04.prod.outlook.com>
Date: Thu, 2 Jul 2020 09:14:53 +0000
From: Andy Duan <fugang.duan@....com>
To: Kegl Rohit <keglrohit@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [EXT] net: ethernet: freescale: fec: copybreak handling
throughput, dma_sync_* optimisations allowed?
From: Kegl Rohit <keglrohit@...il.com> Sent: Thursday, July 2, 2020 3:39 PM
> On Thu, Jul 2, 2020 at 6:18 AM Andy Duan <fugang.duan@....com> wrote:
> > That should ensure the whole area is not dirty.
>
> dma_sync_single_for_cpu() and dma_sync_single_for_device() can or must be
> used in pairs?
> So in this case it is really necessary to sync back the skb data buffer via
> dma_sync_single_for_device? Even when the CPU does not change any bytes
> in the skb data buffer / readonly like in this case.
No, if the buffer is not modified, dma_sync_single_for_device() is not necessary.
For some arm64 core, the dcache invalidate on A53 is flush + invalidate, once the buffer
is modified, it will cause read back wrong data without dma_sync_single_for_device().
And the driver also support Coldfire platforms, I am not family with the arch.
From current analyze for arm/arm64, I also think dma_sync_single_for_device() is not
necessary due to the buffer is not modified.
Anyway, it still need to get other experts comment, and it need to do many test and stress test.
> And there is no DMA_BIDIRECTIONAL mapping.
>
> I thought copybreak it is not about the next frame size. It is about the current
> frame. And the actual length is known via the size field in the finished DMA
> descriptor.
> Or do you mean that the next received frame could be no copybreak frame.
> 1. Rx copybreakable frame with sizeX < copybreak 2. copybreak
> dma_sync_single_for_cpu(dmabuffer, sizeX) 3. copybreak alloc new_skb,
> memcpy(new_skb, dmabuffer, sizeX) 4. copybreak
> dma_sync_single_for_device(dmabuffer, sizeX) 5. Rx non copybreakable
> frame with sizeY >= copybreak 4. dma_unmap_single(dmabuffer,
> FEC_ENET_RX_FRSIZE - fep->rx_align) is called and can cause data corruption
> because not all bytes were marked dirty even if nobody DMA & CPU touched
> them?
No CPU touch, it should be clean.
>
> > > I am new to the DMA API on ARM. Are these changes regarding cache
> > > flushing,... allowed? These would increase the copybreak throughput
> > > by reducing CPU load.
> >
> > To avoid FIFO overrun, it requires to ensure PHY pause frame is enabled.
>
> As the errata states this is also not always true, because the first xoff could
> arrive too late. Pause frames/flow control is not really common and could
> cause troubles with other random network components acting different or not
> supporting pause frames correctly. For example the driver itself does enable
> pause frames for Gigabit by default. But we have no Gigabit Phy so no
> FEC_QUIRK_HAS_GBIT and therefore pause frames are not supported by the
> driver as of now.
>
>
> It looks like copybreak is implemented similar to e1000_main.c
> e1000_copybreak().
> There is only the real/needed packet length (length =
> le16_to_cpu(rx_desc->length)) is synced via dma_sync_single_for_cpu and no
> dma_sync_single_for_device.
>
> Here is a diff with the previous changes assuming that
> dma_sync_single_for_device must be used to avoid any cache flush backs
> even when no data was changed.
Below change seems fine, can you collect some data before you send out
the patch for review.
- run iperf stress test to ensure the stability
- collect the performance improvement data
Thanks.
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 2d0d313ee..464783c15 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -1387,9 +1387,9 @@ static bool fec_enet_copybreak(struct net_device
> *ndev, struct sk_buff **skb,
> return false;
>
> dma_sync_single_for_cpu(&fep->pdev->dev,
> fec32_to_cpu(bdp->cbd_bufaddr),
> - FEC_ENET_RX_FRSIZE - fep->rx_align,
> + length,
> DMA_FROM_DEVICE);
> if (!swap)
> memcpy(new_skb->data, (*skb)->data, length);
> else
> @@ -1413,8 +1413,9 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
> unsigned short status;
> struct sk_buff *skb_new = NULL;
> struct sk_buff *skb;
> ushort pkt_len;
> + ushort pkt_len_nofcs;
> __u8 *data;
> int pkt_received = 0;
> struct bufdesc_ex *ebdp = NULL;
> bool vlan_packet_rcvd = false;
> @@ -1479,9 +1480,10 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
> /* The packet length includes FCS, but we don't want to
> * include that when passing upstream as it messes up
> * bridging applications.
> */
> - is_copybreak = fec_enet_copybreak(ndev, &skb, bdp,
> pkt_len - 4,
> + pkt_len_nofcs = pkt_len - 4;
> + is_copybreak = fec_enet_copybreak(ndev, &skb, bdp,
> pkt_len_nofcs,
> need_swap);
> if (!is_copybreak) {
> skb_new = netdev_alloc_skb(ndev,
> FEC_ENET_RX_FRSIZE);
> if (unlikely(!skb_new)) { @@ -1554,9 +1556,9
> @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
>
> if (is_copybreak) {
> dma_sync_single_for_device(&fep->pdev->dev,
>
> fec32_to_cpu(bdp->cbd_bufaddr),
> -
> FEC_ENET_RX_FRSIZE
> - fep->rx_align,
> +
> pkt_len_nofcs,
>
> DMA_FROM_DEVICE);
> } else {
> rxq->rx_skbuff[index] = skb_new;
> fec_enet_new_rxbdp(ndev, bdp, skb_new);
Powered by blists - more mailing lists