[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+_UZ2xUaV-=mb63f+Hy2aVcfC+y9ds1X70tbZhV8W9gw@mail.gmail.com>
Date: Fri, 27 Jun 2025 12:36:34 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Matt Fleming <matt@...dmodwrite.com>
Cc: Ignat Korchagin <ignat@...udflare.com>, Song Liu <song@...nel.org>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Yonghong Song <yonghong.song@...ux.dev>,
John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
kernel-team <kernel-team@...udflare.com>, Matt Fleming <mfleming@...udflare.com>,
Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [PATCH] bpf: Call cond_resched() to avoid soft lockup in trie_free()
On Fri, Jun 27, 2025 at 6:20 AM Matt Fleming <matt@...dmodwrite.com> wrote:
>
> On Wed, Jun 18, 2025 at 3:50 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > Do your homework pls.
> > Set max_entries to 100G and report back.
> > Then set max_entries to 1G _with_ cond_rescehd() hack and report back.
>
> Hi,
>
> I put together a small reproducer
> https://github.com/xdp-project/bpf-examples/pull/130 which gives the
> following results on an AMD EPYC 9684X 96-Core machine:
>
> | Num of map entries | Linux 6.12.32 | KASAN | cond_resched |
> |--------------------|---------------|---------|--------------|
> | 1K | 0ms | 4ms | 0ms |
> | 10K | 2ms | 50ms | 2ms |
> | 100K | 32ms | 511ms | 32ms |
> | 1M | 427ms | 5478ms | 420ms |
> | 10M | 5056ms | 55714ms | 5040ms |
> | 100M | 67253ms | * | 62630ms |
>
> * - I gave up waiting after 11.5 hours
>
> Enabling KASAN makes the durations an order of magnitude bigger. The
> cond_resched() patch eliminates the soft lockups with no effect on the
> times.
Good. Now you see my point, right?
The cond_resched() doesn't fix the issue.
1hr to free a trie of 100M elements is horrible.
Try 100M kmalloc/kfree to see that slab is not the issue.
trie_free() algorithm is to blame. It doesn't need to start
from the root for every element. Fix the root cause.
Powered by blists - more mailing lists