lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a158ad7-f988-42bf-9a1e-b673ff9122c2@igalia.com>
Date: Thu, 13 Feb 2025 17:41:29 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Song Liu <song@...nel.org>
Cc: ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
 martin.lau@...ux.dev, eddyz87@...il.com, yonghong.song@...ux.dev,
 john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
 haoluo@...gle.com, jolsa@...nel.org, tj@...nel.org, arighi@...dia.com,
 kernel-dev@...lia.com, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next] bpf: Add a retry after refilling the free list
 when unit_alloc() fails

Hello Song,

Thank you for the review!

On 25. 2. 13. 03:33, Song Liu wrote:
> On Wed, Feb 12, 2025 at 12:49 AM Changwoo Min <changwoo@...lia.com> wrote:
>>
>> When there is no entry in the free list (c->free_llist), unit_alloc()
>> fails even when there is available memory in the system, causing allocation
>> failure in various BPF calls -- such as bpf_mem_alloc() and
>> bpf_cpumask_create().
>>
>> Such allocation failure can happen, especially when a BPF program tries many
>> allocations -- more than a delta between high and low watermarks -- in an
>> IRQ-disabled context.
> 
> Can we add a selftests for this scenario?

It would be a bit tricky to create an IRQ-disabled case in a BPF
program. However, I think it will be possible to reproduce the
allocation failure issue when allocating sufficiently enough
small allocations.

> 
>>
>> To address the problem, when there is no free entry, refill one entry on the
>> free list (alloc_bulk) and then retry the allocation procedure on the free
>> list. Note that since some callers of unit_alloc() do not allow to block
>> (e.g., bpf_cpumask_create), allocate the additional free entry in an atomic
>> manner (atomic = true in alloc_bulk).
>>
>> Signed-off-by: Changwoo Min <changwoo@...lia.com>
>> ---
>>   kernel/bpf/memalloc.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index 889374722d0a..22fe9cfb2b56 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
>> @@ -784,6 +784,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>>          struct llist_node *llnode = NULL;
>>          unsigned long flags;
>>          int cnt = 0;
>> +       bool retry = false;
> 
> "retry = false;" reads weird to me. Maybe rename it as "retried"?

"retried" reads betters. Will fix it.

> 
>>
>>          /* Disable irqs to prevent the following race for majority of prog types:
>>           * prog_A
>> @@ -795,6 +796,7 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>>           * Use per-cpu 'active' counter to order free_list access between
>>           * unit_alloc/unit_free/bpf_mem_refill.
>>           */
>> +retry_alloc:
>>          local_irq_save(flags);
>>          if (local_inc_return(&c->active) == 1) {
>>                  llnode = __llist_del_first(&c->free_llist);
>> @@ -815,6 +817,13 @@ static void notrace *unit_alloc(struct bpf_mem_cache *c)
>>           */
>>          local_irq_restore(flags);
>>
>> +       if (unlikely(!llnode && !retry)) {
>> +               int cpu = smp_processor_id();
>> +               alloc_bulk(c, 1, cpu_to_node(cpu), true);
> cpu_to_node() is not necessary, we can just do
> 
> alloc_bulk(c, 1, NUMA_NO_NODE, true);

Sure, will change it as suggested.

> 
> Also, maybe we can let alloc_bulk return int (0 or -ENOMEM).
> For -ENOMEM, there is no need to goto retry_alloc.
> 
> Does this make sense?

Yup, I will change the alloc_bulk() as it returns 0 or -ENOMEM.

Regards,
Changwoo Min

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ