netdev - Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250417224219.29946-1-kuniyu@amazon.com>
Date: Thu, 17 Apr 2025 15:41:48 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <jordan@...fe.io>
CC: <aditi.ghag@...valent.com>, <bpf@...r.kernel.org>, <daniel@...earbox.net>,
	<kuniyu@...zon.com>, <martin.lau@...ux.dev>, <netdev@...r.kernel.org>,
	<willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot

From: Jordan Rife <jordan@...fe.io>
Date: Wed, 16 Apr 2025 16:36:17 -0700
> Require that iter->batch always contains a full bucket snapshot. This
> invariant is important to avoid skipping or repeating sockets during
> iteration when combined with the next few patches. Before, there were
> two cases where a call to bpf_iter_udp_batch may only capture part of a
> bucket:
> 
> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
> 2. When more sockets are added to the bucket while calling
>    bpf_iter_udp_realloc_batch(), making the updated batch size
>    insufficient [2].
> 
> In cases where the batch size only covers part of a bucket, it is
> possible to forget which sockets were already visited, especially if we
> have to process a bucket in more than two batches. This forces us to
> choose between repeating or skipping sockets, so don't allow this:
> 
> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
>    fails instead of continuing with a partial batch.
> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
>    capture the full bucket. On the third attempt, hold onto the bucket
>    lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
>    GFP_ATOMIC to guarantee that the bucket size doesn't change before
>    our next attempt. Try with GFP_USER first to improve the chances
>    that memory allocation succeeds; only use GFP_ATOMIC as a last
>    resort.

kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
if both of them fails, then tries 3. vmalloc().  But, vmalloc() does not
support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
!gfpflags_allow_blocking().

So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.

GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
GFP_ATOMIC makes more sense to me.