lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 2 Jul 2020 09:14:53 +0000 From: Andy Duan <fugang.duan@....com> To: Kegl Rohit <keglrohit@...il.com> CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: RE: [EXT] net: ethernet: freescale: fec: copybreak handling throughput, dma_sync_* optimisations allowed? From: Kegl Rohit <keglrohit@...il.com> Sent: Thursday, July 2, 2020 3:39 PM > On Thu, Jul 2, 2020 at 6:18 AM Andy Duan <fugang.duan@....com> wrote: > > That should ensure the whole area is not dirty. > > dma_sync_single_for_cpu() and dma_sync_single_for_device() can or must be > used in pairs? > So in this case it is really necessary to sync back the skb data buffer via > dma_sync_single_for_device? Even when the CPU does not change any bytes > in the skb data buffer / readonly like in this case. No, if the buffer is not modified, dma_sync_single_for_device() is not necessary. For some arm64 core, the dcache invalidate on A53 is flush + invalidate, once the buffer is modified, it will cause read back wrong data without dma_sync_single_for_device(). And the driver also support Coldfire platforms, I am not family with the arch. From current analyze for arm/arm64, I also think dma_sync_single_for_device() is not necessary due to the buffer is not modified. Anyway, it still need to get other experts comment, and it need to do many test and stress test. > And there is no DMA_BIDIRECTIONAL mapping. > > I thought copybreak it is not about the next frame size. It is about the current > frame. And the actual length is known via the size field in the finished DMA > descriptor. > Or do you mean that the next received frame could be no copybreak frame. > 1. Rx copybreakable frame with sizeX < copybreak 2. copybreak > dma_sync_single_for_cpu(dmabuffer, sizeX) 3. copybreak alloc new_skb, > memcpy(new_skb, dmabuffer, sizeX) 4. copybreak > dma_sync_single_for_device(dmabuffer, sizeX) 5. Rx non copybreakable > frame with sizeY >= copybreak 4. dma_unmap_single(dmabuffer, > FEC_ENET_RX_FRSIZE - fep->rx_align) is called and can cause data corruption > because not all bytes were marked dirty even if nobody DMA & CPU touched > them? No CPU touch, it should be clean. > > > > I am new to the DMA API on ARM. Are these changes regarding cache > > > flushing,... allowed? These would increase the copybreak throughput > > > by reducing CPU load. > > > > To avoid FIFO overrun, it requires to ensure PHY pause frame is enabled. > > As the errata states this is also not always true, because the first xoff could > arrive too late. Pause frames/flow control is not really common and could > cause troubles with other random network components acting different or not > supporting pause frames correctly. For example the driver itself does enable > pause frames for Gigabit by default. But we have no Gigabit Phy so no > FEC_QUIRK_HAS_GBIT and therefore pause frames are not supported by the > driver as of now. > > > It looks like copybreak is implemented similar to e1000_main.c > e1000_copybreak(). > There is only the real/needed packet length (length = > le16_to_cpu(rx_desc->length)) is synced via dma_sync_single_for_cpu and no > dma_sync_single_for_device. > > Here is a diff with the previous changes assuming that > dma_sync_single_for_device must be used to avoid any cache flush backs > even when no data was changed. Below change seems fine, can you collect some data before you send out the patch for review. - run iperf stress test to ensure the stability - collect the performance improvement data Thanks. > > diff --git a/drivers/net/ethernet/freescale/fec_main.c > b/drivers/net/ethernet/freescale/fec_main.c > index 2d0d313ee..464783c15 100644 > --- a/drivers/net/ethernet/freescale/fec_main.c > +++ b/drivers/net/ethernet/freescale/fec_main.c > @@ -1387,9 +1387,9 @@ static bool fec_enet_copybreak(struct net_device > *ndev, struct sk_buff **skb, > return false; > > dma_sync_single_for_cpu(&fep->pdev->dev, > fec32_to_cpu(bdp->cbd_bufaddr), > - FEC_ENET_RX_FRSIZE - fep->rx_align, > + length, > DMA_FROM_DEVICE); > if (!swap) > memcpy(new_skb->data, (*skb)->data, length); > else > @@ -1413,8 +1413,9 @@ fec_enet_rx_queue(struct net_device *ndev, int > budget, u16 queue_id) > unsigned short status; > struct sk_buff *skb_new = NULL; > struct sk_buff *skb; > ushort pkt_len; > + ushort pkt_len_nofcs; > __u8 *data; > int pkt_received = 0; > struct bufdesc_ex *ebdp = NULL; > bool vlan_packet_rcvd = false; > @@ -1479,9 +1480,10 @@ fec_enet_rx_queue(struct net_device *ndev, int > budget, u16 queue_id) > /* The packet length includes FCS, but we don't want to > * include that when passing upstream as it messes up > * bridging applications. > */ > - is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, > pkt_len - 4, > + pkt_len_nofcs = pkt_len - 4; > + is_copybreak = fec_enet_copybreak(ndev, &skb, bdp, > pkt_len_nofcs, > need_swap); > if (!is_copybreak) { > skb_new = netdev_alloc_skb(ndev, > FEC_ENET_RX_FRSIZE); > if (unlikely(!skb_new)) { @@ -1554,9 +1556,9 > @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id) > > if (is_copybreak) { > dma_sync_single_for_device(&fep->pdev->dev, > > fec32_to_cpu(bdp->cbd_bufaddr), > - > FEC_ENET_RX_FRSIZE > - fep->rx_align, > + > pkt_len_nofcs, > > DMA_FROM_DEVICE); > } else { > rxq->rx_skbuff[index] = skb_new; > fec_enet_new_rxbdp(ndev, bdp, skb_new);
Powered by blists - more mailing lists