[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D458FB3@AcuExch.aculab.com>
Date: Mon, 13 Jan 2014 10:13:16 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Willy Tarreau' <w@....eu>,
"davem@...emloft.net" <davem@...emloft.net>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Thomas Petazzoni <thomas.petazzoni@...e-electrons.com>,
Gregory CLEMENT <gregory.clement@...e-electrons.com>
Subject: RE: [PATCH net-next 6/6] net: mvneta: implement rx_copybreak
From: Willy Tarreau
> calling dma_map_single()/dma_unmap_single() is quite expensive compared
> to copying a small packet. So let's copy short frames and keep the buffers
> mapped. We set the limit to 256 bytes which seems to give good results both
> on the XP-GP board and on the AX3/4.
Which architecture is this?
I presume it is one that needs iommu setup and/or cache flushing.
> The Rx small packet rate increased by 16.4% doing this, from 486kpps to
> 573kpps. It is worth noting that even the call to the function
> dma_sync_single_range_for_cpu() is expensive (300 ns) although less
> than dma_unmap_single(). Without it, the packet rate raises to 711kpps
> (+24% more). Thus on systems where coherency from device to CPU is
> guaranteed by a snoop control unit, this patch should provide even more
> gains, and probably rx_copybreak could be increased.
Is that the right way around?
If cache coherency is guaranteed then I'd have thought that the dma sync
would be a nop.
...
> + memcpy(skb_put(skb, rx_bytes),
> + data + MVNETA_MH_SIZE + NET_SKB_PAD,
> + rx_bytes);
You can probably arrange for the copy to be fully aligned since
the partial words at both ends can be safely read and written.
That might speed things up further.
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists