[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78b2d2ad-4e0e-41b7-95b4-b7fe945dfe13@kernel.org>
Date: Thu, 3 Apr 2025 11:37:19 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>, Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org, mrpre@....com, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
<eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Mykola Lysenko <mykolal@...com>,
Shuah Khan <shuah@...nel.org>, Willem de Bruijn <willemb@...gle.com>,
Jason Xing <kerneljasonxing@...il.com>,
Anton Protopopov <aspsk@...valent.com>,
Abhishek Chauhan <quic_abchauha@...cinc.com>,
Jordan Rome <linux@...danrome.com>,
Martin Kelly <martin.kelly@...wdstrike.com>,
David Lechner <dlechner@...libre.com>, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, linux-kselftest@...r.kernel.org,
kernel-team <kernel-team@...udflare.com>
Subject: Re: [PATCH bpf v2 2/2] selftests/bpf: add perf test for
adjust_{head,meta}
On 03/04/2025 02.24, Jakub Kicinski wrote:
> On Mon, 31 Mar 2025 11:23:45 +0800 Jiayuan Chen wrote:
>> which is negligible for the net stack.
>>
>> Before memset
>> ./test_progs -a xdp_adjust_head_perf -v
>> run adjust head with size 6 cost 56 ns
>> run adjust head with size 20 cost 56 ns
>> run adjust head with size 40 cost 56 ns
>> run adjust head with size 200 cost 56 ns
>>
>> After memset
>> ./test_progs -a xdp_adjust_head_perf -v
>> run adjust head with size 6 cost 58 ns
>> run adjust head with size 20 cost 58 ns
>> run adjust head with size 40 cost 58 ns
>> run adjust head with size 200 cost 66 ns
>
> FWIW I'm not sure if this is "negligible" for XDP like you say,
> but I defer to Jesper :)
It would be too much for the XDP_DROP use-case, e.g. DDoS protection and
driver hardware eval. But this is changing a BPF-helper, which means it
is opt-in by the BPF-programmer. Thus, we can accept larger perf
overhead here.
I suspect your 2 nanosec overhead primarily comes from the function call
overhead. On my AMD production system with SRSO mitigation enabled I
expect to see around 6 ns overhead (5.699 ns), which is sad.
I've done a lot of benchmarking of memset (see [1]). One take-away is
that memset with small const values will get compiled into very fast
code that avoids the function call (basically QWORD MOVs). E.g. memset
with const 32 is extremely fast[2], on my system it takes 0.673 ns (and
0.562 ns comes from for-loop overhead). Thus, it is possible to do
something faster, as we are clearing very small values. I.e. using a
duff's device construct like I did for remainder in [2].
In this case, as this is a BPF-helper, I am uncertain if it is worth the
complexity to add such optimizations... I guess not.
This turned into a long way of saying, I'm okay with this change.
[1]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c
[2]
https://github.com/netoptimizer/prototype-kernel/blob/35b1716d0c300e7fa2c8b6d8cfed2ec81df8f3a4/kernel/lib/time_bench_memset.c#L520-L521
--Jesper
Powered by blists - more mailing lists