[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875yellcx6.fsf@toke.dk>
Date: Thu, 08 Dec 2022 23:59:33 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Stanislav Fomichev <sdf@...gle.com>, bpf@...r.kernel.org
Cc: ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
martin.lau@...ux.dev, song@...nel.org, yhs@...com,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...gle.com,
haoluo@...gle.com, jolsa@...nel.org,
Saeed Mahameed <saeedm@...dia.com>,
David Ahern <dsahern@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Willem de Bruijn <willemb@...gle.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Anatoly Burakov <anatoly.burakov@...el.com>,
Alexander Lobakin <alexandr.lobakin@...el.com>,
Magnus Karlsson <magnus.karlsson@...il.com>,
Maryam Tahhan <mtahhan@...hat.com>, xdp-hints@...-project.net,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 11/12] mlx5: Support RX XDP metadata
Stanislav Fomichev <sdf@...gle.com> writes:
> From: Toke Høiland-Jørgensen <toke@...hat.com>
>
> Support RX hash and timestamp metadata kfuncs. We need to pass in the cqe
> pointer to the mlx5e_skb_from* functions so it can be retrieved from the
> XDP ctx to do this.
So I finally managed to get enough ducks in row to actually benchmark
this. With the caveat that I suddenly can't get the timestamp support to
work (it was working in an earlier version, but now
timestamp_supported() just returns false). I'm not sure if this is an
issue with the enablement patch, or if I just haven't gotten the
hardware configured properly. I'll investigate some more, but figured
I'd post these results now:
Baseline XDP_DROP: 25,678,262 pps / 38.94 ns/pkt
XDP_DROP + read metadata: 23,924,109 pps / 41.80 ns/pkt
Overhead: 1,754,153 pps / 2.86 ns/pkt
As per the above, this is with calling three kfuncs/pkt
(metadata_supported(), rx_hash_supported() and rx_hash()). So that's
~0.95 ns per function call, which is a bit less, but not far off from
the ~1.2 ns that I'm used to. The tests where I accidentally called the
default kfuncs cut off ~1.3 ns for one less kfunc call, so it's
definitely in that ballpark.
I'm not doing anything with the data, just reading it into an on-stack
buffer, so this is the smallest possible delta from just getting the
data out of the driver. I did confirm that the call instructions are
still in the BPF program bytecode when it's dumped back out from the
kernel.
-Toke
Powered by blists - more mailing lists