[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <877cc8e7io.fsf@cloudflare.com>
Date: Thu, 22 Aug 2024 20:29:03 +0200
From: Jakub Sitnicki <jakub@...udflare.com>
To: Philo Lu <lulie@...ux.alibaba.com>
Cc: bpf <bpf@...r.kernel.org>, netdev@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, Eric Dumazet
<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, kernel-team
<kernel-team@...udflare.com>
Subject: Re: Question: Move BPF_SK_LOOKUP ahead of connected UDP sk lookup?
On Wed, Aug 21, 2024 at 07:44 PM +08, Philo Lu wrote:
> On 2024/8/21 17:23, Jakub Sitnicki wrote:
>> Hi Philo,
>> [CC Eric and Paolo who have more context than me here.]
>> On Tue, Aug 20, 2024 at 08:31 PM +08, Philo Lu wrote:
>>> Hi all, I wonder if it is feasible to move BPF_SK_LOOKUP ahead of connected UDP
>>> sk lookup?
>>>
> ...
>>>
>>> So is there any other problem on it?Or I'll try to work on it and commit
>>> patches later.
>>>
>>> [0]https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
>>>
>>> Thank you for your time.
>> It was done like that to maintain the connected UDP socket guarantees.
>> Similarly to the established TCP sockets. The contract is that if you
>> are bound to a 4-tuple, you will receive the packets destined to it.
>>
>
> Thanks for your explaination. IIUC, bpf_sk_lookup was designed to skip connected
> socket lookup (established for TCP and connected for UDP), so it is not supposed
> to run before connected UDP lookup.
> (though it seems so close to solve our problem...)
Yes, correct. Motivation behind bpf_sk_lookup was to steer TCP
connections & UDP flows to listening / unconnected sockets, like you can
do with TPROXY [1].
Since it had nothing to do with established / connected sockets, we
added the BPF hook in such a way that they are unaffected by it.
>> It sounds like you are looking for an efficient way to lookup a
>> connected UDP socket. We would be interested in that as well. We use> connected UDP/QUIC on egress where we don't expect the peer to roam and
>> change its address. There's a memory cost on the kernel side to using
>> them, but they make it easier to structure your application, because you
>> can have roughly the same design for TCP and UDP transport.
>>
> Yes, we have exactly the same problem.
Good to know that there are other users of connected UDP out there.
Loosely related - I'm planning to raise the question if using connected
UDP sockets on ingress makes sense for QUIC at Plumbers [2]. Connected
UDP lookup performance is one of the aspects, here.
>> So what if instead of doing it in BPF, we make it better for everyone
>> and introduce a hash table keyed by 4-tuple for connected sockets in the
>> udp stack itself (counterpart of ehash in tcp)?
>
> This solution is also ok to me. But I'm not sure are there previous attempts or
> technical problems on it?
>
> In fact, I have done a simple test with 4-tuple UDP lookup, and it does make a
> difference:
> (kernel-5.10, 1000 connected UDP socket on server, use sockperf to send msg to
> one of them, and take average for 5s)
>
> Without 4-tuple lookup:
>
> %Cpu0: 0.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 100.0 si, 0.0 st
> %Cpu1: 0.2 us, 0.2 sy, 0.0 ni, 99.4 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st
> MiB Mem :7625.1 total, 6761.5 free, 210.2 used, 653.4 buff/cache
> MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7176.2 avail Mem
>
> ---
> With 4-tuple lookup:
>
> %Cpu0: 0.2 us, 0.4 sy, 0.0 ni, 48.1 id, 0.0 wa, 1.2 hi, 50.1 si, 0.0 st
> %Cpu1: 0.6 us, 0.4 sy, 0.0 ni, 98.8 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st
> MiB Mem :7625.1 total, 6759.9 free, 211.9 used, 653.3 buff/cache
> MiB Swap: 0.0 total, 0.0 free, 0.0 used. 7174.6 avail Mem
Right. The overhead is expected. All server's connected sockets end up
in one hash bucket and we need to walk a long chain on lookup.
The workaround is not "pretty". You have configure your server to
receive on IP addresses and/or ports :-/
[1] Which also respects established / connected sockets, as long as they
have_TRANSPARENT flag set. Users need to set it "manually" for UDP.
[2] https://lpc.events/event/18/abstracts/2134/
Powered by blists - more mailing lists