[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250417224219.29946-1-kuniyu@amazon.com>
Date: Thu, 17 Apr 2025 15:41:48 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <jordan@...fe.io>
CC: <aditi.ghag@...valent.com>, <bpf@...r.kernel.org>, <daniel@...earbox.net>,
<kuniyu@...zon.com>, <martin.lau@...ux.dev>, <netdev@...r.kernel.org>,
<willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v3 bpf-next 2/6] bpf: udp: Make sure iter->batch always contains a full bucket snapshot
From: Jordan Rife <jordan@...fe.io>
Date: Wed, 16 Apr 2025 16:36:17 -0700
> Require that iter->batch always contains a full bucket snapshot. This
> invariant is important to avoid skipping or repeating sockets during
> iteration when combined with the next few patches. Before, there were
> two cases where a call to bpf_iter_udp_batch may only capture part of a
> bucket:
>
> 1. When bpf_iter_udp_realloc_batch() returns -ENOMEM [1].
> 2. When more sockets are added to the bucket while calling
> bpf_iter_udp_realloc_batch(), making the updated batch size
> insufficient [2].
>
> In cases where the batch size only covers part of a bucket, it is
> possible to forget which sockets were already visited, especially if we
> have to process a bucket in more than two batches. This forces us to
> choose between repeating or skipping sockets, so don't allow this:
>
> 1. Stop iteration and propagate -ENOMEM up to userspace if reallocation
> fails instead of continuing with a partial batch.
> 2. Retry bpf_iter_udp_realloc_batch() up to two times if we fail to
> capture the full bucket. On the third attempt, hold onto the bucket
> lock (hslot2->lock) through bpf_iter_udp_realloc_batch() with
> GFP_ATOMIC to guarantee that the bucket size doesn't change before
> our next attempt. Try with GFP_USER first to improve the chances
> that memory allocation succeeds; only use GFP_ATOMIC as a last
> resort.
kvmalloc() tries the kmalloc path, 1. slab and 2. page allocator, and
if both of them fails, then tries 3. vmalloc(). But, vmalloc() does not
support GFP_ATOMIC, __kvmalloc_node_noprof() returns at
!gfpflags_allow_blocking().
So, falling back to GFP_ATOMIC is most unlikely to work as the last resort.
GFP_ATOMIC first and falling back to GFP_USER few more times, or not using
GFP_ATOMIC makes more sense to me.
Powered by blists - more mailing lists