linux-kernel - Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALrw=nFvUwmpjUMYh5iJqjo6SbAO8fZt8pkys7iDjZHfpF2DxQ@mail.gmail.com>
Date: Wed, 18 Jun 2025 15:27:23 +0100
From: Ignat Korchagin <ignat@...udflare.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Matt Fleming <matt@...dmodwrite.com>, Song Liu <song@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Yonghong Song <yonghong.song@...ux.dev>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	kernel-team <kernel-team@...udflare.com>, Matt Fleming <mfleming@...udflare.com>, 
	Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie_free()

On Wed, Jun 18, 2025 at 3:01 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Wed, Jun 18, 2025 at 5:29 AM Matt Fleming <matt@...dmodwrite.com> wrote:
> >
> > On Tue, Jun 17, 2025 at 4:55 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Tue, Jun 17, 2025 at 2:43 AM Matt Fleming <matt@...dmodwrite.com> wrote:
> > > >
> > >
> > > > soft lockup - CPU#41 stuck for 76s
> > >
> > > How many elements are in the trie that it takes 76 seconds??
> >
> > We run our maps with potentially millions of entries, so it's the size
> > of the map plus the fact that kfree() does more work with KASAN that
> > triggers this for us.
> >
> > > I feel the issue is different.
> > > It seems the trie_free() algorithm doesn't scale.
> > > Pls share a full reproducer.
> >
> > Yes, the scalability of the algorithm is also an issue. Jesper (CC'd)
> > had some thoughts on this.
> >
> > But regardless, it seems like a bad idea to have an unbounded loop
> > inside the kernel that processes user-controlled data.
>
> 1M kfree should still be very fast even with kasan, lockdep, etc.
> 76 seconds is an algorithm problem. Address the root cause.

What if later we have 1G? 100G? Apart from the root cause we still
have "scalability concerns" unless we can somehow reimplement this as
O(1)

Ignat