[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210615012107.577ead86@linux.microsoft.com>
Date: Tue, 15 Jun 2021 01:21:07 +0200
From: Matteo Croce <mcroce@...ux.microsoft.com>
To: David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, peppe.cavallaro@...com,
alexandre.torgue@...s.st.com, kuba@...nel.org, palmer@...belt.com,
paul.walmsley@...ive.com, drew@...gleboard.org, kernel@...il.dk
Subject: Re: [PATCH net-next] stmmac: align RX buffers
On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem@...emloft.net> wrote:
>
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
>
> Even on cpus that don't trap, the access will be slower.
>
> Thanks.
Isn't the IP header which should be aligned to avoid expensive traps?
>From include/linux/skbuff.h:
* Since an ethernet header is 14 bytes network drivers often end up with
* the IP header at an unaligned offset. The IP header can be aligned by
* shifting the start of the packet by 2 bytes. Drivers should do this
* with:
*
* skb_reserve(skb, NET_IP_ALIGN);
But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+ printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+ printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
skb_copy_to_linear_data(skb, xdp.data, buf1_len);
[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0
And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:
arch/alpha/lib/memcpy.c:
/* If both source and dest are word aligned copy words */
if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {
arch/xtensa/lib/memcopy.S:
/*
* Destination and source are word-aligned, use word copy.
*/
# copy 16 bytes per iteration for word-aligned dst and word-aligned src
arch/openrisc/lib/memcpy.c:
/* If both source and dest are word aligned copy words */
if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {
And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:
[ 16.648485] skb->data alignment: 2
[ 16.651894] xdp.data alignment: 2
[ 16.714260] skb->data alignment: 2
[ 16.717688] xdp.data alignment: 2
Does this make sense?
Regards,
--
per aspera ad upstream
Powered by blists - more mailing lists