netdev - Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250417233303.37489-1-kuniyu@amazon.com>
Date: Thu, 17 Apr 2025 16:32:58 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <martin.lau@...ux.dev>
CC: <aditi.ghag@...valent.com>, <bpf@...r.kernel.org>, <daniel@...earbox.net>,
	<jordan@...fe.io>, <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
	<willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot

From: Martin KaFai Lau <martin.lau@...ux.dev>
Date: Thu, 17 Apr 2025 16:12:57 -0700
> On 4/17/25 3:41 PM, Kuniyuki Iwashima wrote:
> > From: Jordan Rife <jordan@...fe.io>
> > Date: Wed, 16 Apr 2025 16:36:17 -0700
> >> Require that iter->batch always contains a full bucket snapshot. This
> >> invariant is important to avoid skipping or repeating sockets during
> >> iteration when combined with the next few patches. Before, there were
> >> two cases where a call to bpf_iter_udp_batch may only capture part of a
> >> bucket:
> >>
> >> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
> >> 2. When more sockets are added to the bucket while calling
> >>     bpf_iter_udp_realloc_batch(), making the updated batch size
> >>     insufficient [2].
> >>
> >> In cases where the batch size only covers part of a bucket, it is
> >> possible to forget which sockets were already visited, especially if we
> >> have to process a bucket in more than two batches. This forces us to
> >> choose between repeating or skipping sockets, so don't allow this:
> >>
> >> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
> >>     fails instead of continuing with a partial batch.
> >> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
> >>     capture the full bucket. On the third attempt, hold onto the bucket
> >>     lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
> >>     GFP_ATOMIC to guarantee that the bucket size doesn't change before
> >>     our next attempt. Try with GFP_USER first to improve the chances
> >>     that memory allocation succeeds; only use GFP_ATOMIC as a last
> >>     resort.
> > 
> > kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
> > if both of them fails, then tries 3. vmalloc().  But, vmalloc() does not
> > support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
> > !gfpflags_allow_blocking().
> > 
> > So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.
> > 
> > GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
> > GFP_ATOMIC makes more sense to me.
> 
> If I read it correctly, the last retry with GFP_ATOMIC is not because of the 
> earlier GFP_USER allocation failure but the size of the bucket has changed a lot 
> that it is doing one final attempt to get the whole bucket and this requires to 
> hold the bucket lock to ensure the size stays the same which then must use 
> GFP_ATOMIC.

Ah exactly, when allocation fails, it always returned an error.

Sorry, I should've read code first.