[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b23ed0e2-05cf-454b-bf7a-a637c9bb48e8@kernel.org>
Date: Tue, 29 Jul 2025 13:15:53 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>, Lorenzo Bianconi <lorenzo@...nel.org>
Cc: Stanislav Fomichev <stfomichev@...il.com>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <borkmann@...earbox.net>,
Eric Dumazet <eric.dumazet@...il.com>, "David S. Miller"
<davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>, sdf@...ichev.me,
kernel-team@...udflare.com, arthur@...hurfabre.com, jakub@...udflare.com,
Jesse Brandeburg <jbrandeburg@...udflare.com>,
Andrew Rzeznik <arzeznik@...udflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for
XDP_REDIRECTed packets
On 28/07/2025 18.29, Jakub Kicinski wrote:
> On Mon, 28 Jul 2025 12:53:01 +0200 Lorenzo Bianconi wrote:
>>>> I can see why you might think that, but from my perspective, the
>>>> xdp_frame *is* the implementation of the mini-SKB concept. We've been
>>>> building it incrementally for years. It started as the most minimal
>>>> structure possible and has gradually gained more context (e.g. dev_rx,
>>>> mem_info/rxq_info, flags, and also uses skb_shared_info with same layout
>>>> as SKB).
>>>
>>> My understanding was that just adding all the fields to xdp_frame was
>>> considered too wasteful. Otherwise we would have done something along
>>> those lines ~10 years ago :S
>>
>> Hi Jakub,
>>
>> sorry for the late reply.
Same, back from vacation.
>> I am completely fine to redesign the solution to overcome the problem but I
>> guess this feature will allow us to improve XDP performance in a common/real
>> use-case. Let's consider we want to redirect a packet into a veth and then into
>> a container. Preserving the hw metadata performing XDP_REDIRECT will allow us
>> to avoid recalculating the checksum creating the skb. This will result in a
>> very nice performance improvement.
>> So I guess we should really come up with some idea to add this missing feature.
>
>
> Martin mentioned to me that he had proposed in the past that we allow
> allocating the skb at the XDP level, if the program needs "skb-level
> metadata". That actually seems pretty clean to me.. Was it ever
> explored?
That idea has been considered before, but it unfortunately doesn't work
from a performance angle. The performance model of XDP_REDIRECT into
CPUMAP relies on moving the expensive SKB allocation+init to a remote
CPU. This keeps the ingress CPU free to process packets at near line
rate (our DDoS use-case). If we allocate the SKB on the ingress-CPU
before the redirect, we destroy this load-balancing model and create the
exact bottleneck we designed CPUMAP to avoid.
To bring the focus back to the specific problem this series solves,
let's review the concrete use case. Our IPsec scenario is a key example:
on the ingress CPU, an XDP program calculates a hash from inner packet
headers to load-balance traffic via CPUMAP. When the packet arrives on
the remote CPU, this hash is lost, so the new SKB is created with a hash
of zero. This, in turn, causes poor load-balancing when the packet is
forwarded to a multi-queue device like veth, as traffic often collapses
to a single queue. The purpose of this patchset is simply to provide a
standard way to carry that hash to the remote CPU within the xdp_frame.
(Same goes for a standard way to carry VLAN tags)
Given this specific problem, is there a better approach to solving it
than what this patchset proposes?
--Jesper
Powered by blists - more mailing lists