netdev - Re: [RESEND PATCH v2 bpf-next 02/12] bpf: tcp: Make sure iter->batch always contains a full bucket snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <posiabqearkkbt3o4l4yueyn3kl6jvw2r4fuxceabgju2etg7x@m7fepnnvkgjj>
Date: Tue, 24 Jun 2025 12:49:07 -0700
From: Jordan Rife <jordan@...fe.io>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, 
	Daniel Borkmann <daniel@...earbox.net>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Willem de Bruijn <willemdebruijn.kernel@...il.com>, Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [RESEND PATCH v2 bpf-next 02/12] bpf: tcp: Make sure iter->batch
 always contains a full bucket snapshot

> Can we try to unroll this? Add new helpers to hide the repeating parts,
> store extra state in iter if needed.
> 
> AFAIU, we want the following:
> 1. find sk, try to fill the batch, if it fits -> bail out
> 2. try to allocate new batch with GPU_USER, try to fill again -> bail
>    out
> 3. otherwise, attempt GPF_NOWAIT and do that dance where you copy over
>    previous partial copy
> 
> The conditional put in bpf_iter_tcp_put_batch does not look nice :-(

With the unrolling, I think it should be simple enough to just call
bpf_iter_tcp_put_batch in the right place instead of embedding it inside
bpf_iter_tcp_realloc_batch conditional. Agree this might be clearer.

> Same for unconditional memcpy (which, if I understand correctly, only
> needed for GFP_NOWAIT case). I'm 99% sure your current version works,

This matters for both cases. Later in this series, this memcpy is
necessary to copy socket cookies stored in iter->batch to find our place
in the bucket again after reacquiring the lock. IMO this still belongs
here; in both cases, we need to copy the contents from the old batch
before freeing it.

> but it's a bit hard to follow :-(
> 
> Untested code to illustrate the idea below. Any reason it won't work?

After revisiting the code, I now remember why I didn't do something like
this before.

> 
> /* fast path */
> 
> sk = tcp_seek_last_pos(seq);
> if (!sk) return NULL;
> fits = bpf_iter_tcp_fill_batch(...);
> bpf_iter_tcp_unlock_bucket(iter);
> if (fits) return sk;
> 
> /* not enough space to store full batch, try to reallocate with GFP_USER */
> 
> bpf_iter_tcp_free_batch(iter);
> 
> if (bpf_iter_tcp_alloc_batch(iter, GFP_USER)) {
> 	/* allocated 'expected' size, try to fill again */
> 
> 	sk = tcp_seek_last_pos(seq);

Since you release the lock on the bucket above, and it could have
changed in various interesting ways in the meantime (e.g. maybe it's
empty now), tcp_seek_last_pos may have moved on to a different bucket.

> 	if (!sk) return NULL;
> 	fits = bpf_iter_tcp_fill_batch(...);

If that new bucket is bigger then this fails and we immediately move
onto the GFP_NOWAIT block. Before, we were trying to avoid falling back
to GFP_NOWAIT if possible; it was only there to ensure we could capture
a full snapshot in case of a fast-growing bucket. With the unrolled
logic, we widen the set of scenarios where we use GFP_NOWAIT. With the
loop (goto again) we would just realloc with GFP_USER if the bucket had
advanced. The original intent was to try GFP_USER once per bucket, but
unrolling shifts this to once per call.

> 	if (fits) {
> 		bpf_iter_tcp_unlock_bucket(iter);
> 		return sk;
> 	}
> }

Both approaches work. Overall, the unrolled logic is slghtly clearer
while making the GFP_NOWAIT condition slightly more likely while the
loop logic is slightly less clear while making the GFP_NOWAIT condition
less likely.

In practice, the difference is probably negligible though, so yeah it
might be better to just favor clarity here. Let me go ahead and try to
unroll this. If I run into any issues which make it impracical I'll let
you know.

Jordan