[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a158ad7-f988-42bf-9a1e-b673ff9122c2@igalia.com>
Date: Thu, 13 Feb 2025 17:41:29 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Song Liu <song@...nel.org>
Cc: ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
martin.lau@...ux.dev, eddyz87@...il.com, yonghong.song@...ux.dev,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, tj@...nel.org, arighi@...dia.com,
kernel-dev@...lia.com, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next] bpf: Add a retry after refilling the free list
when unit_alloc() fails
Hello Song,
Thank you for the review!
On 25. 2. 13. 03:33, Song Liu wrote:
> On Wed, Feb 12, 2025 at 12:49 AM Changwoo Min <changwoo@...lia.com> wrote:
>>
>> When there is no entry in the free list (c->free_llist), unit_alloc()
>> fails even when there is available memory in the system, causing allocation
>> failure in various BPF calls -- such as bpf_mem_alloc() and
>> bpf_cpumask_create().
>>
>> Such allocation failure can happen, especially when a BPF program tries many
>> allocations -- more than a delta between high and low watermarks -- in an
>> IRQ-disabled context.
>
> Can we add a selftests for this scenario?
It would be a bit tricky to create an IRQ-disabled case in a BPF
program. However, I think it will be possible to reproduce the
allocation failure issue when allocating sufficiently enough
small allocations.
>
>>
>> To address the problem, when there is no free entry, refill one entry on the
>> free list (alloc_bulk) and then retry the allocation procedure on the free
>> list. Note that since some callers of unit_alloc() do not allow to block
>> (e.g., bpf_cpumask_create), allocate the additional free entry in an atomic
>> manner (atomic = true in alloc_bulk).
>>
>> Signed-off-by: Changwoo Min <changwoo@...lia.com>
>> ---
>> kernel/bpf/memalloc.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index 889374722d0a..22fe9cfb2b56 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
>> @@ -784,6 +784,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>> struct llist_node *llnode = NULL;
>> unsigned long flags;
>> int cnt = 0;
>> + bool retry = false;
>
> "retry = false;" reads weird to me. Maybe rename it as "retried"?
"retried" reads betters. Will fix it.
>
>>
>> /* Disable irqs to prevent the following race for majority of prog types:
>> * prog_A
>> @@ -795,6 +796,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>> * Use per-cpu 'active' counter to order free_list access between
>> * unit_alloc/unit_free/bpf_mem_refill.
>> */
>> +retry_alloc:
>> local_irq_save(flags);
>> if (local_inc_return(&c->active) == 1) {
>> llnode = __llist_del_first(&c->free_llist);
>> @@ -815,6 +817,13 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>> */
>> local_irq_restore(flags);
>>
>> + if (unlikely(!llnode && !retry)) {
>> + int cpu = smp_processor_id();
>> + alloc_bulk(c, 1, cpu_to_node(cpu), true);
> cpu_to_node() is not necessary, we can just do
>
> alloc_bulk(c, 1, NUMA_NO_NODE, true);
Sure, will change it as suggested.
>
> Also, maybe we can let alloc_bulk return int (0 or -ENOMEM).
> For -ENOMEM, there is no need to goto retry_alloc.
>
> Does this make sense?
Yup, I will change the alloc_bulk() as it returns 0 or -ENOMEM.
Regards,
Changwoo Min
Powered by blists - more mailing lists