[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k01zzgyq.fsf@toke.dk>
Date: Fri, 06 Jan 2023 18:54:37 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Andy Gospodarek <andrew.gospodarek@...adcom.com>
Cc: Tariq Toukan <ttoukan.linux@...il.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Jakub Kicinski <kuba@...nel.org>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>,
ast@...nel.org, daniel@...earbox.net, davem@...emloft.net,
hawk@...nel.org, john.fastabend@...il.com, andrii@...nel.org,
kafai@...com, songliubraving@...com, yhs@...com,
kpsingh@...nel.org, lorenzo.bianconi@...hat.com,
netdev@...r.kernel.org, bpf@...r.kernel.org,
Jesper Dangaard Brouer <brouer@...hat.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, gal@...dia.com,
Saeed Mahameed <saeedm@...dia.com>, tariqt@...dia.com
Subject: Re: [PATCH net-next v2] samples/bpf: fixup some tools to be able to
support xdp multibuffer
>>> So my main concern would be that if we "allow" this, the only way to
>>> write an interoperable XDP program will be to use bpf_xdp_load_bytes()
>>> for every packet access. Which will be slower than DPA, so we may end up
>>> inadvertently slowing down all of the XDP ecosystem, because no one is
>>> going to bother with writing two versions of their programs. Whereas if
>>> you can rely on packet headers always being in the linear part, you can
>>> write a lot of the "look at headers and make a decision" type programs
>>> using just DPA, and they'll work for multibuf as well.
>>
>> The question I would have is what is really the 'slow down' for
>> bpf_xdp_load_bytes() vs DPA? I know you and Jesper can tell me how many
>> instructions each use. :)
>
> I can try running some benchmarks to compare the two, sure!
Okay, ran a simple test: a program that just parses the IP header, then
drops the packet. Results as follows:
Baseline (don't touch data): 26.5 Mpps / 37.8 ns/pkt
Touch data (ethernet hdr): 25.0 Mpps / 40.0 ns/pkt
Parse IP (DPA): 24.1 Mpps / 41.5 ns/pkt
Parse IP (bpf_xdp_load_bytes): 15.3 Mpps / 65.3 ns/pkt
So 2.2 ns of overhead from reading the packet data, another 1.5 ns from
the parsing logic, and a whopping 23.8 ns extra from switching to
bpf_xdp_load_bytes(). This is with two calls to bpf_xdp_load_bytes(),
one to get the Ethernet header, and another to get the IP header.
Dropping one of them also drops the overhead in half, so it seems to fit
with ~12 ns of overhead from a single call to bpf_xdp_load_bytes().
I pushed the code I used for testing here, in case someone else wants to
play around with it:
https://github.com/xdp-project/xdp-tools/tree/xdp-load-bytes
It's part of the 'xdp-bench' utility. Run it as:
./xdp-bench drop <iface> -p parse-ip
for DPA parsing and
./xdp-bench drop <iface> -p parse-ip -l
to use bpf_xdp_load_bytes().
-Toke
Powered by blists - more mailing lists