netdev - Re: [RFC PATCH bpf-next 00/14] bpf: Efficient socket destruction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d2ncodvtjvi635zlg56yomqvjdkg3ilrpcgwhpoot362ayz3oe@ck37mtffvcep>
Date: Sat, 11 Oct 2025 11:04:05 -0700
From: Jordan Rife <jordan@...fe.io>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Stanislav Fomichev <sdf@...ichev.me>, 
	Willem de Bruijn <willemdebruijn.kernel@...il.com>, Kuniyuki Iwashima <kuniyu@...gle.com>, 
	Aditi Ghag <aditi.ghag@...valent.com>, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kerneljasonxing@...il.com>
Subject: Re: [RFC PATCH bpf-next 00/14] bpf: Efficient socket destruction

On Tue, Oct 07, 2025 at 01:32:30PM -0700, Martin KaFai Lau wrote:
> On 10/6/25 1:19 PM, Jordan Rife wrote:
> 
> > Yeah, this is a bit tricky. I'll have to think a bit more about how
> > this would work. The ETIMEOUT thing would work for TCP, but if I'm
> > trying to extend this to UDP sockets I think you may need an explicit
> > bpf_sock_destroy() call anyway? And if you're making
> > bpf_sock_destroy() work in that context then maybe supporting ETIMEOUT
> > is redundant?
> > 
> > > [ Unrelated, but in case it needs a new BPF_SOCK_OPS_*_CB enum. I would mostly
> > > freeze any new BPF_SOCK_OPS_*_CB addition and requiring to move the bpf_sock_ops
> > > to the struct_ops infrastructure first before adding new ops. ]
> > 
> > Thanks, I'll look into this. One aspect I'm uncertain about is
> > applying this kind of approach to UDP sockets. The BPF_SOCK_OPS_RTO_CB
> > callback provides a convenient place to handle this for TCP, but UDP
> > doesn't exactly have any timeouts where a similar callback makes
> > sense. Instead, you'd need to have something like a callback for UDP
> > that executes on every sendmsg call where you run some logic similar
> > to the code above. This is less ideal, since you need to do extra work
> > on every sendmsg call instead of just when a timeout occurs as with
> 
> Yeah, regardless of ETIMEOUT or bpf_sock_destroy(), I think the
> BPF_SOCK_OPS_RTO_CB is better for TCP because of no overhead on the fastpath
> msg.
> 
> > BPF_SOCK_OPS_RTO_CB, but maybe the extra cost here would be
> > negligible. Combined, I imagine something like this:
> > 
> > switch (op) {
> > case BPF_SOCK_OPS_RTO_CB:
> > case BPF_SOCK_OPS_UDP_SENDMSG_CB:
> Beside the fastpath msg overhead, I hate to say this, no new CB enum can be
> added. I was hoping the only exception is the pending udp timestamping work
> but it has been pending for too long, so we have to move on.

Ah yeah, understood. I was a bit unclear in my last email. I get that
sockops is closed to extension until migration to struct_ops. I guess my
question was would it be worth trying to move bpf_sock_ops to struct_ops
at this stage, but this seems like a heavy lift so is probably best left
for later.

> The bpf_sock_ops needs to move to struct_ops first. I suspect some of the
> bpf_sock_destroy() hiccup being faced here is that the running context is
> only known at runtime as an enum instead of something static that the
> verifier can help to check the right kfunc to use. Once struct_ops is ready,
> adding a sendmsg ops will be in general useful.
> 
> If TCP is solved with the existing BPF_SOCK_OPS_RTO_CB+ETIMEOUT, the
> remaining is UDP and it seems you are interested in connected (iirc?) UDP
> only. It is why I asked how many UDP sockets you may have in production.
> > I think using socket callbacks like BPF_SOCK_OPS_RTO_CB would make for
> > a more elegant solution and wouldn't require as much bookkeeping,
> yeah, if it needs to iterate less, it has to do its own bookkeeping. This
> patch uses the sock_hash but it can also be done in the bpf list/rb/arena
> also. The bpf_sock_destroy_might_sleep() should be strict forward. The
> SEC("syscall")+bpf_sock_destroy_might_sleep could be useful for other use
> cases also.

For now, I'll look into the BPF_SOCK_OPS_RTO_CB+ETIMEOUT mechanism for
the TCP case and bpf_sock_destroy_might_sleep() for the connected UDP
case if that sounds good to you. Even if the UDP case could be handled
in a simpler way eventually with struct_ops (e.g. a sendmsg callback),
bpf_sock_destroy_might_sleep() would be useful to have around as you
say.

I'm curious if the migration to struct_ops is already underway. If not,
I'd be interested in exploring this more later on.

Thanks!

Jordan