[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1974322e-8c30-4c01-a566-642ed2bc7086@linux.dev>
Date: Thu, 20 Mar 2025 22:46:55 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jordan Rife <jrife@...gle.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org,
Daniel Borkmann <daniel@...earbox.net>,
Yonghong Song <yonghong.song@...ux.dev>,
Aditi Ghag <aditi.ghag@...valent.com>
Subject: Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket
iterators
On 3/18/25 5:23 PM, Jordan Rife wrote:
>> imo, this is not a problem for bpf. The bpf prog has access to many fields of a
>> udp_sock (ip addresses, ports, state...etc) to make the right decision. The bpf
>> prog can decide if that rehashed socket needs to be bpf_sock_destroy(), e.g. the
>> saddr in this case because of inet_reset_saddr(sk) before the rehash. From the
>> bpf prog's pov, the rehashed udp_sock is not much different from a new udp_sock
>> getting added from the userspace into the later bucket.
>
> As a user of BPF iterators, I would, and did, find this behavior quite
> surprising. If BPF iterators make no promises about visiting each
> thing exactly once, then should that be made explicit somewhere (maybe
> it already is?)? I think the natural thing for a user is to assume
> that an iterator will only visit each "thing" once and to write their
I can see the argument that the bpf_sock_destroy() kfunc does not work as
expected if the expectation is the sk will not be rehashed. Is it your use case?
I am open to have another bpf_sock_destroy() kfunc to disallow the rehash but
that will be different from the current udp_disconnect() behavior which will
need a separate discussion. I currently don't have this use case though.
> code accordingly. Using my example from before, counting the number of
> sockets I destroyed, needs to be implemented differently if I might
> revisit the same socket during iteration by explicitly filtering for
> duplicates inside the BPF program (possibly by filtering out sockets
> where the state is TCP_CLOSE, for example) or userspace. While in this
> particular example it isn't all that important if I get the count
> wrong, how do we know other users of BPF iterators won't make the same
> assumption where repeats matter more? I still think it would be nice
> if iterators themselves guaranteed exactly-once semantics but
> understand if this isn't the direction you want BPF iterators to go.
Powered by blists - more mailing lists