netdev - Re: [PATCH net-next] stmmac: align RX buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210615012107.577ead86@linux.microsoft.com>
Date:   Tue, 15 Jun 2021 01:21:07 +0200
From:   Matteo Croce <mcroce@...ux.microsoft.com>
To:     David Miller <davem@...emloft.net>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-riscv@...ts.infradead.org, peppe.cavallaro@...com,
        alexandre.torgue@...s.st.com, kuba@...nel.org, palmer@...belt.com,
        paul.walmsley@...ive.com, drew@...gleboard.org, kernel@...il.dk
Subject: Re: [PATCH net-next] stmmac: align RX buffers

On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem@...emloft.net> wrote:

> 
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
> 
> Even on cpus that don't trap, the access will be slower.
> 
> Thanks.

Isn't the IP header which should be aligned to avoid expensive traps?
>From include/linux/skbuff.h:

 * Since an ethernet header is 14 bytes network drivers often end up with
 * the IP header at an unaligned offset. The IP header can be aligned by
 * shifting the start of the packet by 2 bytes. Drivers should do this
 * with:
 *
 * skb_reserve(skb, NET_IP_ALIGN);

But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+               printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+               printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
                skb_copy_to_linear_data(skb, xdp.data, buf1_len);

[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0

And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:

arch/alpha/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

arch/xtensa/lib/memcopy.S:
	/*
	 * Destination and source are word-aligned, use word copy.
	 */
	# copy 16 bytes per iteration for word-aligned dst and word-aligned src

arch/openrisc/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:

[   16.648485] skb->data alignment: 2
[   16.651894] xdp.data alignment: 2
[   16.714260] skb->data alignment: 2
[   16.717688] xdp.data alignment: 2

Does this make sense?

Regards,
-- 
per aspera ad upstream