netdev - Re: [PATCH net-next v9 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <663cdcb73953_126914294b5@willemb.c.googlers.com.notmuch>
Date: Thu, 09 May 2024 10:24:55 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Richard Gobert <richardbgobert@...il.com>, 
 richardbgobert@...il.com
Cc: alexander.duyck@...il.com, 
 davem@...emloft.net, 
 dsahern@...nel.org, 
 edumazet@...gle.com, 
 kuba@...nel.org, 
 linux-kernel@...r.kernel.org, 
 linux-kselftest@...r.kernel.org, 
 netdev@...r.kernel.org, 
 pabeni@...hat.com, 
 shuah@...nel.org, 
 willemdebruijn.kernel@...il.com
Subject: Re: [PATCH net-next v9 2/3] net: gro: move L3 flush checks to
 tcp_gro_receive and udp_gro_receive_segment

Richard Gobert wrote:
> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
> iph->id, ...) against all packets in a loop. These flush checks are used in
> all merging UDP and TCP flows.
> 
> These checks need to be done only once and only against the found p skb,
> since they only affect flush and not same_flow.
> 
> This patch leverages correct network header offsets from the cb for both
> outer and inner network headers - allowing these checks to be done only
> once, in tcp_gro_receive and udp_gro_receive_segment. As a result,
> NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are
> more declarative and contained in inet_gro_flush, thus removing the need
> for flush_id in napi_gro_cb.
> 
> This results in less parsing code for non-loop flush tests for TCP and UDP
> flows.
> 
> To make sure results are not within noise range - I've made netfilter drop
> all TCP packets, and measured CPU performance in GRO (in this case GRO is
> responsible for about 50% of the CPU utilization).
> 
> perf top while replaying 64 parallel IP/TCP streams merging in GRO:
> (gro_receive_network_flush is compiled inline to tcp_gro_receive)
> net-next:
>         6.94% [kernel] [k] inet_gro_receive
>         3.02% [kernel] [k] tcp_gro_receive
> 
> patch applied:
>         4.27% [kernel] [k] tcp_gro_receive
>         4.22% [kernel] [k] inet_gro_receive
> 
> perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
> results for any encapsulation, in this case inet_gro_receive is top
> offender in net-next)
> net-next:
>         10.09% [kernel] [k] inet_gro_receive
>         2.08% [kernel] [k] tcp_gro_receive
> 
> patch applied:
>         6.97% [kernel] [k] inet_gro_receive
>         3.68% [kernel] [k] tcp_gro_receive
> 
> Signed-off-by: Richard Gobert <richardbgobert@...il.com>

> +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2,
> +				 struct sk_buff *p, bool outer)
> +{
> +	const u32 id = ntohl(*(__be32 *)&iph->id);
> +	const u32 id2 = ntohl(*(__be32 *)&iph2->id);
> +	const u16 ipid_offset = (id >> 16) - (id2 >> 16);
> +	const u16 count = NAPI_GRO_CB(p)->count;
> +	const u32 df = id & IP_DF;
> +	int flush;
> +
> +	/* All fields must match except length and checksum. */
> +	flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF));
> +
> +	if (outer && df)
> +		return flush;

    if (flush)
            return 1;

To be able to avoid the two flush | below?
Or to avoid adding a branch

    if (flush | (outer && df))
            return 1;

> +
> +	/* When we receive our second frame we can make a decision on if we
> +	 * continue this flow as an atomic flow with a fixed ID or if we use
> +	 * an incrementing ID.
> +	 */
> +	if (count == 1 && df && !ipid_offset)
> +		NAPI_GRO_CB(p)->ip_fixedid = true;
> +
> +	if (NAPI_GRO_CB(p)->ip_fixedid && df)
> +		return flush | ipid_offset;
> +
> +	return flush | (ipid_offset ^ count);

And then simply

    if (NAPI_GRO_CB(p)->ip_fixedid)
            return ipid_offset;
    else
            return ipid_offset ^ count;

Since NAPI_GRO_CB(p)->ip_fixedid is only set if DF is set on the first
two segments, and df ^ id2 & IP_DF is tested above, no need to test
that again.

> +}