netdev - Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADKFtnQyiz_r_vfyYfTvzi3MvNpRt62mDrNyEvp9tm82UcSFjQ@mail.gmail.com>
Date: Tue, 18 Mar 2025 16:09:08 -0700
From: Jordan Rife <jrife@...gle.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, 
	Daniel Borkmann <daniel@...earbox.net>, Yonghong Song <yonghong.song@...ux.dev>, 
	Aditi Ghag <aditi.ghag@...valent.com>
Subject: Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators

To add to this, I actually encountered some strange behavior today
where using bpf_sock_destroy actually /causes/ sockets to repeat
during iteration. In my environment, I just have one socket in a
network namespace with a socket iterator that destroys it. The
iterator visits the same socket twice and calls bpf_sock_destroy twice
as a result. In the UDP case (and maybe TCP, I haven't checked)
bpf_sock_destroy() can call udp_abort (sk->sk_prot->diag_destroy()) ->
__udp_disconnect() -> udp_v4_rehash() (sk->sk_prot->rehash(sk)) which
rehashes the socket and moves it to a new bucket. Depending on where a
socket lands, you may encounter it again as you progress through the
buckets. Doing some inspection with bpftrace seems to confirm this. As
opposed to the edge cases I described before, this is more likely. I
noticed this when I tried to use bpf_seq_write to write something for
every socket that got deleted for an accurate count at the end in
userspace which seems like a fairly valid use case.

Not sure the best way to avoid this. __udp_disconnect() sets
sk->sk_state to TCP_CLOSE, so filtering out sockets like that during
iteration would avoid repeating sockets you've destroyed, but may be a
bit course-grained; you could inadvertently skip other sockets that
you don't want to skip. The approach in the RFC would work, since you
could just avoid any sockets where abs(sk->sk_idx) > whatever the
table version was when you started iterating, basically iterating only
over what was in your initial "table snapshot", but maybe there's a
simpler approach.

-Jordan