linux-kernel - [PATCH net-next] stmmac: align RX buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210614022504.24458-1-mcroce@linux.microsoft.com>
Date:   Mon, 14 Jun 2021 04:25:04 +0200
From:   Matteo Croce <mcroce@...ux.microsoft.com>
To:     netdev@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Drew Fustini <drew@...gleboard.org>,
        Emil Renner Berthing <kernel@...il.dk>
Subject: [PATCH net-next] stmmac: align RX buffers

From: Matteo Croce <mcroce@...rosoft.com>

On RX an SKB is allocated and the received buffer is copied into it.
But on some architectures, the memcpy() needs the source and destination
buffers to have the same alignment to be efficient.

This is not our case, because SKB data pointer is misaligned by two bytes
to compensate the ethernet header.

Align the RX buffer the same way as the SKB one, so the copy is faster.
An iperf3 RX test gives a decent improvement on a RISC-V machine:

before:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   733 MBytes   615 Mbits/sec   88             sender
[  5]   0.00-10.01  sec   730 MBytes   612 Mbits/sec                  receiver

after:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec                  receiver

And the memcpy() overhead during the RX drops dramatically.

before:
Overhead  Shared O  Symbol
  43.35%  [kernel]  [k] memcpy
  33.77%  [kernel]  [k] __asm_copy_to_user
   3.64%  [kernel]  [k] sifive_l2_flush64_range

after:
Overhead  Shared O  Symbol
  45.40%  [kernel]  [k] __asm_copy_to_user
  28.09%  [kernel]  [k] memcpy
   4.27%  [kernel]  [k] sifive_l2_flush64_range

Signed-off-by: Matteo Croce <mcroce@...rosoft.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index b6cd43eda7ac..04bdb3950d63 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -338,9 +338,9 @@ static inline bool stmmac_xdp_is_enabled(struct stmmac_priv *priv)
 static inline unsigned int stmmac_rx_offset(struct stmmac_priv *priv)
 {
 	if (stmmac_xdp_is_enabled(priv))
-		return XDP_PACKET_HEADROOM;
+		return XDP_PACKET_HEADROOM + NET_IP_ALIGN;
 
-	return 0;
+	return NET_SKB_PAD + NET_IP_ALIGN;
 }
 
 void stmmac_disable_rx_queue(struct stmmac_priv *priv, u32 queue);
-- 
2.31.1