netdev - Re: [PATCH] net: xdp: allow user space to request a smaller packet headroom requirement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <86673054-9fbd-c4db-7a4b-0fe904a2ca7f@redhat.com>
Date:   Tue, 15 Mar 2022 07:50:08 +0100
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Felix Fietkau <nbd@....name>,
        Daniel Borkmann <daniel@...earbox.net>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        "Jesper D. Brouer" <netdev@...uer.com>, netdev@...r.kernel.org,
        bpf <bpf@...r.kernel.org>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
Cc:     brouer@...hat.com, John Fastabend <john.fastabend@...il.com>,
        Jakub Kicinski <kuba@...nel.org>
Subject: Re: [PATCH] net: xdp: allow user space to request a smaller packet
 headroom requirement

On 14/03/2022 23.43, Felix Fietkau wrote:
> 
> On 14.03.22 23:20, Daniel Borkmann wrote:
>> On 3/14/22 11:16 PM, Toke Høiland-Jørgensen wrote:
>>> Felix Fietkau <nbd@....name> writes:
>>>> On 14.03.22 21:39, Jesper D. Brouer wrote:
>>>>> (Cc. BPF list and other XDP maintainers)
>>>>> On 14/03/2022 11.22, Felix Fietkau wrote:
>>>>>> Most ethernet drivers allocate a packet headroom of NET_SKB_PAD. 
>>>>>> Since it is
>>>>>> rounded up to L1 cache size, it ends up being at least 64 bytes on 
>>>>>> the most
>>>>>> common platforms.
>>>>>> On most ethernet drivers, having a guaranteed headroom of 256 
>>>>>> bytes for XDP
>>>>>> adds an extra forced pskb_expand_head call when enabling SKB XDP, 
>>>>>> which can
>>>>>> be quite expensive.
>>>>>> Many XDP programs need only very little headroom, so it can be 
>>>>>> beneficial
>>>>>> to have a way to opt-out of the 256 bytes headroom requirement.
>>>>>
>>>>> IMHO 64 bytes is too small.
>>>>> We are using this area for struct xdp_frame and also for metadata
>>>>> (XDP-hints).  This will limit us from growing this structures for
>>>>> the sake of generic-XDP.
>>>>>
>>>>> I'm fine with reducting this to 192 bytes, as most Intel drivers
>>>>> have this headroom, and have defacto established that this is
>>>>> a valid XDP headroom, even for native-XDP.
>>>>>
>>>>> We could go a small as two cachelines 128 bytes, as if xdp_frame
>>>>> and metadata grows above a cache-line (64 bytes) each, then we have
>>>>> done something wrong (performance wise).
 >>>>
>>>> Here's some background on why I chose 64 bytes: I'm currently
>>>> implementing a userspace + xdp program to act as generic fastpath to
>>>> speed network bridging.

Cc. Lorenzo as you mention you wanted to accelerate Linux bridging.
We can avoid a lot of external FIB sync in userspace, if we
add some BPF-helpers to lookup in bridge FIB table.

>>>
>>> Any reason this can't run in the TC ingress hook instead? Generic XDP is
>>> a bit of an odd duck, and I'm not a huge fan of special-casing it this
>>> way...
>>
>> +1, would have been fine with generic reduction to just down to 192 bytes
>> (though not less than that), but 64 is a bit too little. 

+1

>> Also curious on why not tc ingress instead?
 >
> I chose XDP because of bpf_redirect_map, which doesn't seem to be 
> available to tc ingress classifier programs.

TC have a "normal" bpf_redirect which is slower than a bpf_redirect_map.
The secret to the performance boost from bpf_redirect_map is the hidden
TX bulking layer.  I have experimented with TC bulking, which showed a
30% performance boost on TC redirect to TX with fixed 32 frame bulking.

Notice that newer libbpf makes it easier to attach to the TC hook
without shell'ing out to 'tc' binary. See how to use in this example:

https://github.com/xdp-project/bpf-examples/blob/master/tc-policy/tc_txq_policy.c

> When I started writing the code, I didn't know that generic XDP 
> performance would be bad on pretty much any ethernet/WLAN driver that 
> wasn't updated to support it.

Yes, we kind of kept generic-XDP non-optimized as the real tool for
the  job is TC-BPF, given at this stage the SKB have already been 
allocate, and converting it back to a XDP representation is not
going to be faster that TC-BPF.

Maybe we should optimize generic-XDP a bit more, because I'm
buying the argument, that developers want to write one BPF program
that works across device drivers, instead of having to handle
both TC-BPF and XDP-BPF.

Notice that generic-XDP doesn't actually do the bulking step, even
though using bpf_redirect_map. (would be obvious optimization).

If we extend kernel/bpf.devmap.c with bulking for SKBs, then both
TC-BPF and generic-XDP could share that code path, and obtain
the TX bulking performance boost via SKB-list + xmit_more.

--Jesper