netdev - Re: [RFC PATCH bpf-next 00/14] bpf: Efficient socket destruction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <854e2fce-9d34-4472-b7b8-f66248f3ff01@linux.dev>
Date: Tue, 7 Oct 2025 13:32:30 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jordan Rife <jordan@...fe.io>
Cc: Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Stanislav Fomichev
 <sdf@...ichev.me>, Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 Kuniyuki Iwashima <kuniyu@...gle.com>, Aditi Ghag
 <aditi.ghag@...valent.com>, bpf@...r.kernel.org, netdev@...r.kernel.org,
 Jason Xing <kerneljasonxing@...il.com>
Subject: Re: [RFC PATCH bpf-next 00/14] bpf: Efficient socket destruction

On 10/6/25 1:19 PM, Jordan Rife wrote:

> Yeah, this is a bit tricky. I'll have to think a bit more about how
> this would work. The ETIMEOUT thing would work for TCP, but if I'm
> trying to extend this to UDP sockets I think you may need an explicit
> bpf_sock_destroy() call anyway? And if you're making
> bpf_sock_destroy() work in that context then maybe supporting ETIMEOUT
> is redundant?
> 
>> [ Unrelated, but in case it needs a new BPF_SOCK_OPS_*_CB enum. I would mostly
>> freeze any new BPF_SOCK_OPS_*_CB addition and requiring to move the bpf_sock_ops
>> to the struct_ops infrastructure first before adding new ops. ]
> 
> Thanks, I'll look into this. One aspect I'm uncertain about is
> applying this kind of approach to UDP sockets. The BPF_SOCK_OPS_RTO_CB
> callback provides a convenient place to handle this for TCP, but UDP
> doesn't exactly have any timeouts where a similar callback makes
> sense. Instead, you'd need to have something like a callback for UDP
> that executes on every sendmsg call where you run some logic similar
> to the code above. This is less ideal, since you need to do extra work
> on every sendmsg call instead of just when a timeout occurs as with

Yeah, regardless of ETIMEOUT or bpf_sock_destroy(), I think the 
BPF_SOCK_OPS_RTO_CB is better for TCP because of no overhead on the fastpath msg.

> BPF_SOCK_OPS_RTO_CB, but maybe the extra cost here would be
> negligible. Combined, I imagine something like this:
> 
> switch (op) {
> case BPF_SOCK_OPS_RTO_CB:
> case BPF_SOCK_OPS_UDP_SENDMSG_CB:
Beside the fastpath msg overhead, I hate to say this, no new CB enum can be 
added. I was hoping the only exception is the pending udp timestamping work but 
it has been pending for too long, so we have to move on.

The bpf_sock_ops needs to move to struct_ops first. I suspect some of the 
bpf_sock_destroy() hiccup being faced here is that the running context is only 
known at runtime as an enum instead of something static that the verifier can 
help to check the right kfunc to use. Once struct_ops is ready, adding a sendmsg 
ops will be in general useful.

If TCP is solved with the existing BPF_SOCK_OPS_RTO_CB+ETIMEOUT, the remaining 
is UDP and it seems you are interested in connected (iirc?) UDP only. It is why 
I asked how many UDP sockets you may have in production.
> I think using socket callbacks like BPF_SOCK_OPS_RTO_CB would make for
> a more elegant solution and wouldn't require as much bookkeeping,
yeah, if it needs to iterate less, it has to do its own bookkeeping. This patch 
uses the sock_hash but it can also be done in the bpf list/rb/arena also. The 
bpf_sock_destroy_might_sleep() should be strict forward. The 
SEC("syscall")+bpf_sock_destroy_might_sleep could be useful for other use cases 
also.