linux-kernel - Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+_UZ2xUaV-=mb63f+Hy2aVcfC+y9ds1X70tbZhV8W9gw@mail.gmail.com>
Date: Fri, 27 Jun 2025 12:36:34 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Matt Fleming <matt@...dmodwrite.com>
Cc: Ignat Korchagin <ignat@...udflare.com>, Song Liu <song@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Yonghong Song <yonghong.song@...ux.dev>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	kernel-team <kernel-team@...udflare.com>, Matt Fleming <mfleming@...udflare.com>, 
	Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie_free()

On Fri, Jun 27, 2025 at 6:20 AM Matt Fleming <matt@...dmodwrite.com> wrote:
>
> On Wed, Jun 18, 2025 at 3:50 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > Do your homework pls.
> > Set max_entries to 100G and report back.
> > Then set max_entries to 1G _with_ cond_rescehd() hack and report back.
>
> Hi,
>
> I put together a small reproducer
> https://github.com/xdp-project/bpf-examples/pull/130 which gives the
> following results on an AMD EPYC 9684X 96-Core machine:
>
> | Num of map entries | Linux 6.12.32 |  KASAN  | cond_resched |
> |--------------------|---------------|---------|--------------|
> | 1K                 | 0ms           | 4ms     | 0ms          |
> | 10K                | 2ms           | 50ms    | 2ms          |
> | 100K               | 32ms          | 511ms   | 32ms         |
> | 1M                 | 427ms         | 5478ms  | 420ms        |
> | 10M                | 5056ms        | 55714ms | 5040ms       |
> | 100M               | 67253ms       | *       | 62630ms      |
>
> * - I gave up waiting after 11.5 hours
>
> Enabling KASAN makes the durations an order of magnitude bigger. The
> cond_resched() patch eliminates the soft lockups with no effect on the
> times.

Good. Now you see my point, right?
The cond_resched() doesn't fix the issue.
1hr to free a trie of 100M elements is horrible.
Try 100M kmalloc/kfree to see that slab is not the issue.
trie_free() algorithm is to blame. It doesn't need to start
from the root for every element. Fix the root cause.