netdev - Re: [PATCH bpf-next v3] selftests/bpf: Add LPM trie microbenchmarks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAENh_SS2R3aQByV_=WRCO=ZHknk_+pV7RhXA4qx5OGMBN1SnOA@mail.gmail.com>
Date: Tue, 29 Jul 2025 14:56:07 +0100
From: Matt Fleming <matt@...dmodwrite.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman <eddyz87@...il.com>, Shuah Khan <shuah@...nel.org>, 
	kernel-team <kernel-team@...udflare.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	bpf <bpf@...r.kernel.org>, Martin KaFai Lau <martin.lau@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, Network Development <netdev@...r.kernel.org>, 
	Matt Fleming <mfleming@...udflare.com>
Subject: Re: [PATCH bpf-next v3] selftests/bpf: Add LPM trie microbenchmarks

On Mon, Jul 28, 2025 at 3:35 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> Please make a full description of what the test does,
> since it's not trivial to decipher from the code.
> If I'm reading it correctly, first, the user space
> makes the LPM completely full and then lookup/update
> use random key-s within range.

Yep, that's correct. I'll provide a more comprehensive description in v4.

> But delete looks different. It seems the kernel delete
> operation can interleave with user space refilling the LPM,
> so ...
>
> >   lookup: throughput    7.423 ± 0.023 M ops/s (  7.423M ops/prod), latency  134.710 ns/op
> >   update: throughput    2.643 ± 0.015 M ops/s (  2.643M ops/prod), latency  378.310 ns/op
> >   delete: throughput    0.712 ± 0.008 M ops/s (  0.712M ops/prod), latency 1405.152 ns/op
>
> this comparison doesn't look like apples to apples,
> since delete will include user space refill time.

Yeah, you're right. Though we only measure the delete time from in the
BPF prog, delete throughput is essentially blocked while the refill
happens and because things are measured with a 1-second timer
(bench.c:sigalarm_handler) the refill time gets accounted for anyway.
I could try extrapolating the delete time like I've done for the free
op, i.e. we calculate how many ops were completed in what fraction of
a second.

> >     free: throughput    0.574 ± 0.003 K ops/s (  0.574K ops/prod), latency    1.743 ms/op
>
> Does this measure the free-ing of full LPM ?

Yes, this measures the total time to free every element in the trie.

> > +static void __lpm_validate(void)
>
> why double underscore ?
> Just lpm_validate() ?

The double underscore is used for the common validation parts, but
I'll rename this to include "_common()" (along with all other uses of
__).

> > +       /*
> > +        * Ideally we'd have access to the map ID but that's already
> > +        * freed before we enter trie_free().
> > +        */
> > +       BPF_CORE_READ_STR_INTO(&name, map, name);
> > +       if (bpf_strncmp(name, BPF_OBJ_NAME_LEN, "trie_free_map"))
> > +               return 0;
> > +
> > +       val = bpf_ktime_get_ns();
> > +       bpf_map_update_elem(&latency_free_start, &map, &val, BPF_ANY);
>
> Looks like there is only one lpm map.
> What's the point of that extra map ?
> Just have a global var for start time ?

Sure, I can make this a global variable for the start time instead.

> bpf_get_prandom_u32() is not free
> and modulo operation isn't free either.
> The benchmark includes their time.
> It's ok to have it, but add a mode where the bench
> tests linear lookup/update too with simple key.data++

Good idea.

> Since the map suppose to full before we start all keys
> should be there, right?

Yes.

> Let's add a sanity check that update() succeeds.

Will do.

> > +static int delete (__u32 index, bool *need_refill)
> > +{
> > +       struct trie_key key = {
> > +               .data = deleted_entries,
> > +               .prefixlen = prefixlen,
> > +       };
> > +
> > +       bpf_map_delete_elem(&trie_map, &key);
> > +
> > +       /* Do we need to refill the map? */
> > +       if (++deleted_entries == nr_entries) {
> > +               /*
> > +                * Atomicity isn't required because DELETE only supports
> > +                * one producer running concurrently. What we need is a
> > +                * way to track how many entries have been deleted from
> > +                * the trie between consecutive invocations of the BPF
> > +                * prog because a single bpf_loop() call might not
> > +                * delete all entries, e.g. when NR_LOOPS < nr_entries.
> > +                */
> > +               deleted_entries = 0;
> > +               *need_refill = true;
> > +               return 1;
>
> This early break is another reason that makes
> 'delete' op different from 'lookup/update'.
> Pls make all 3 consistent, so they can be compared to each other.

Hmm.. I'm not quite sure how to do that. lookup/update don't modify
the number of entries in the map whereas delete does (update only
updates existing entries, it doesn't create new ones). So when the map
is empty it needs to be refilled. You're right that somehow the time
to refill needs to be removed from the delete throughput/latency
numbers but fundamentally these 3 ops are not the same and I don't see
how to treat them as such.