[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <854e2fce-9d34-4472-b7b8-f66248f3ff01@linux.dev>
Date: Tue, 7 Oct 2025 13:32:30 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jordan Rife <jordan@...fe.io>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Stanislav Fomichev
<sdf@...ichev.me>, Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Kuniyuki Iwashima <kuniyu@...gle.com>, Aditi Ghag
<aditi.ghag@...valent.com>, bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kerneljasonxing@...il.com>
Subject: Re: [RFC PATCH bpf-next 00/14] bpf: Efficient socket destruction
On 10/6/25 1:19 PM, Jordan Rife wrote:
> Yeah, this is a bit tricky. I'll have to think a bit more about how
> this would work. The ETIMEOUT thing would work for TCP, but if I'm
> trying to extend this to UDP sockets I think you may need an explicit
> bpf_sock_destroy() call anyway? And if you're making
> bpf_sock_destroy() work in that context then maybe supporting ETIMEOUT
> is redundant?
>
>> [ Unrelated, but in case it needs a new BPF_SOCK_OPS_*_CB enum. I would mostly
>> freeze any new BPF_SOCK_OPS_*_CB addition and requiring to move the bpf_sock_ops
>> to the struct_ops infrastructure first before adding new ops. ]
>
> Thanks, I'll look into this. One aspect I'm uncertain about is
> applying this kind of approach to UDP sockets. The BPF_SOCK_OPS_RTO_CB
> callback provides a convenient place to handle this for TCP, but UDP
> doesn't exactly have any timeouts where a similar callback makes
> sense. Instead, you'd need to have something like a callback for UDP
> that executes on every sendmsg call where you run some logic similar
> to the code above. This is less ideal, since you need to do extra work
> on every sendmsg call instead of just when a timeout occurs as with
Yeah, regardless of ETIMEOUT or bpf_sock_destroy(), I think the
BPF_SOCK_OPS_RTO_CB is better for TCP because of no overhead on the fastpath msg.
> BPF_SOCK_OPS_RTO_CB, but maybe the extra cost here would be
> negligible. Combined, I imagine something like this:
>
> switch (op) {
> case BPF_SOCK_OPS_RTO_CB:
> case BPF_SOCK_OPS_UDP_SENDMSG_CB:
Beside the fastpath msg overhead, I hate to say this, no new CB enum can be
added. I was hoping the only exception is the pending udp timestamping work but
it has been pending for too long, so we have to move on.
The bpf_sock_ops needs to move to struct_ops first. I suspect some of the
bpf_sock_destroy() hiccup being faced here is that the running context is only
known at runtime as an enum instead of something static that the verifier can
help to check the right kfunc to use. Once struct_ops is ready, adding a sendmsg
ops will be in general useful.
If TCP is solved with the existing BPF_SOCK_OPS_RTO_CB+ETIMEOUT, the remaining
is UDP and it seems you are interested in connected (iirc?) UDP only. It is why
I asked how many UDP sockets you may have in production.
> I think using socket callbacks like BPF_SOCK_OPS_RTO_CB would make for
> a more elegant solution and wouldn't require as much bookkeeping,
yeah, if it needs to iterate less, it has to do its own bookkeeping. This patch
uses the sock_hash but it can also be done in the bpf list/rb/arena also. The
bpf_sock_destroy_might_sleep() should be strict forward. The
SEC("syscall")+bpf_sock_destroy_might_sleep could be useful for other use cases
also.
Powered by blists - more mailing lists