[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874j54iht3.fsf@toke.dk>
Date: Tue, 22 Oct 2024 11:54:32 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Puranjay Mohan <puranjay@...nel.org>, Albert Ou <aou@...s.berkeley.edu>,
Alexei Starovoitov <ast@...nel.org>, Andrew Morton
<akpm@...ux-foundation.org>, Andrii Nakryiko <andrii@...nel.org>,
bpf@...r.kernel.org, Daniel Borkmann <daniel@...earbox.net>, "David S.
Miller" <davem@...emloft.net>, Eduard Zingerman <eddyz87@...il.com>, Eric
Dumazet <edumazet@...gle.com>, Hao Luo <haoluo@...gle.com>, Helge Deller
<deller@....de>, Jakub Kicinski <kuba@...nel.org>, "James E.J. Bottomley"
<James.Bottomley@...senPartnership.com>, Jiri Olsa <jolsa@...nel.org>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org,
linux-riscv@...ts.infradead.org, Martin KaFai Lau <martin.lau@...ux.dev>,
Mykola Lysenko <mykolal@...com>, netdev@...r.kernel.org, Palmer Dabbelt
<palmer@...belt.com>, Paolo Abeni <pabeni@...hat.com>, Paul Walmsley
<paul.walmsley@...ive.com>, Puranjay Mohan <puranjay12@...il.com>,
Puranjay Mohan <puranjay@...nel.org>, Shuah Khan <shuah@...nel.org>, Song
Liu <song@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Yonghong Song
<yonghong.song@...ux.dev>
Subject: Re: [PATCH bpf-next 2/5] bpf: bpf_csum_diff: optimize and
homogenize for all archs
Puranjay Mohan <puranjay@...nel.org> writes:
> 1. Optimization
> ------------
>
> The current implementation copies the 'from' and 'to' buffers to a
> scratchpad and it takes the bitwise NOT of 'from' buffer while copying.
> In the next step csum_partial() is called with this scratchpad.
>
> so, mathematically, the current implementation is doing:
>
> result = csum(to - from)
>
> Here, 'to' and '~ from' are copied in to the scratchpad buffer, we need
> it in the scratchpad buffer because csum_partial() takes a single
> contiguous buffer and not two disjoint buffers like 'to' and 'from'.
>
> We can re write this equation to:
>
> result = csum(to) - csum(from)
>
> using the distributive property of csum().
>
> this allows 'to' and 'from' to be at different locations and therefore
> this scratchpad and copying is not needed.
>
> This in C code will look like:
>
> result = csum_sub(csum_partial(to, to_size, seed),
> csum_partial(from, from_size, 0));
>
> 2. Homogenization
> --------------
>
> The bpf_csum_diff() helper calls csum_partial() which is implemented by
> some architectures like arm and x86 but other architectures rely on the
> generic implementation in lib/checksum.c
>
> The generic implementation in lib/checksum.c returns a 16 bit value but
> the arch specific implementations can return more than 16 bits, this
> works out in most places because before the result is used, it is passed
> through csum_fold() that turns it into a 16-bit value.
>
> bpf_csum_diff() directly returns the value from csum_partial() and
> therefore the returned values could be different on different
> architectures. see discussion in [1]:
>
> for the int value 28 the calculated checksums are:
>
> x86 : -29 : 0xffffffe3
> generic (arm64, riscv) : 65507 : 0x0000ffe3
> arm : 131042 : 0x0001ffe2
>
> Pass the result of bpf_csum_diff() through from32to16() before returning
> to homogenize this result for all architectures.
>
> NOTE: from32to16() is used instead of csum_fold() because csum_fold()
> does from32to16() + bitwise NOT of the result, which is not what we want
> to do here.
>
> [1] https://lore.kernel.org/bpf/CAJ+HfNiQbOcqCLxFUP2FMm5QrLXUUaj852Fxe3hn_2JNiucn6g@mail.gmail.com/
>
> Signed-off-by: Puranjay Mohan <puranjay@...nel.org>
Pretty neat simplification :)
Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>
Powered by blists - more mailing lists