netdev - Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <42b84ea3-b3c1-4839-acfc-bd182e7af313@linux.dev>
Date: Thu, 17 Apr 2025 16:12:57 -0700
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Kuniyuki Iwashima <kuniyu@...zon.com>, jordan@...fe.io
Cc: aditi.ghag@...valent.com, bpf@...r.kernel.org, daniel@...earbox.net,
 netdev@...r.kernel.org, willemdebruijn.kernel@...il.com
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always
 contains a full bucket snapshot

On 4/17/25 3:41 PM, Kuniyuki Iwashima wrote:
> From: Jordan Rife <jordan@...fe.io>
> Date: Wed, 16 Apr 2025 16:36:17 -0700
>> Require that iter->batch always contains a full bucket snapshot. This
>> invariant is important to avoid skipping or repeating sockets during
>> iteration when combined with the next few patches. Before, there were
>> two cases where a call to bpf_iter_udp_batch may only capture part of a
>> bucket:
>>
>> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
>> 2. When more sockets are added to the bucket while calling
>>     bpf_iter_udp_realloc_batch(), making the updated batch size
>>     insufficient [2].
>>
>> In cases where the batch size only covers part of a bucket, it is
>> possible to forget which sockets were already visited, especially if we
>> have to process a bucket in more than two batches. This forces us to
>> choose between repeating or skipping sockets, so don't allow this:
>>
>> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
>>     fails instead of continuing with a partial batch.
>> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
>>     capture the full bucket. On the third attempt, hold onto the bucket
>>     lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
>>     GFP_ATOMIC to guarantee that the bucket size doesn't change before
>>     our next attempt. Try with GFP_USER first to improve the chances
>>     that memory allocation succeeds; only use GFP_ATOMIC as a last
>>     resort.
> 
> kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
> if both of them fails, then tries 3. vmalloc().  But, vmalloc() does not
> support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
> !gfpflags_allow_blocking().
> 
> So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.
> 
> GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
> GFP_ATOMIC makes more sense to me.

If I read it correctly, the last retry with GFP_ATOMIC is not because of the 
earlier GFP_USER allocation failure but the size of the bucket has changed a lot 
that it is doing one final attempt to get the whole bucket and this requires to 
hold the bucket lock to ensure the size stays the same which then must use 
GFP_ATOMIC.