lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADKFtnQyiz_r_vfyYfTvzi3MvNpRt62mDrNyEvp9tm82UcSFjQ@mail.gmail.com>
Date: Tue, 18 Mar 2025 16:09:08 -0700
From: Jordan Rife <jrife@...gle.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, 
	Daniel Borkmann <daniel@...earbox.net>, Yonghong Song <yonghong.song@...ux.dev>, 
	Aditi Ghag <aditi.ghag@...valent.com>
Subject: Re: [RFC PATCH bpf-next 0/3] Avoid skipping sockets with socket iterators

To add to this, I actually encountered some strange behavior today
where using bpf_sock_destroy actually /causes/ sockets to repeat
during iteration. In my environment, I just have one socket in a
network namespace with a socket iterator that destroys it. The
iterator visits the same socket twice and calls bpf_sock_destroy twice
as a result. In the UDP case (and maybe TCP, I haven't checked)
bpf_sock_destroy() can call udp_abort (sk->sk_prot->diag_destroy()) ->
__udp_disconnect() -> udp_v4_rehash() (sk->sk_prot->rehash(sk)) which
rehashes the socket and moves it to a new bucket. Depending on where a
socket lands, you may encounter it again as you progress through the
buckets. Doing some inspection with bpftrace seems to confirm this. As
opposed to the edge cases I described before, this is more likely. I
noticed this when I tried to use bpf_seq_write to write something for
every socket that got deleted for an accurate count at the end in
userspace which seems like a fairly valid use case.

Not sure the best way to avoid this. __udp_disconnect() sets
sk->sk_state to TCP_CLOSE, so filtering out sockets like that during
iteration would avoid repeating sockets you've destroyed, but may be a
bit course-grained; you could inadvertently skip other sockets that
you don't want to skip. The approach in the RFC would work, since you
could just avoid any sockets where abs(sk->sk_idx) > whatever the
table version was when you started iterating, basically iterating only
over what was in your initial "table snapshot", but maybe there's a
simpler approach.

-Jordan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ