netdev - Re: [External] Re: [PATCH v2 2/2] selftest/bpf/benchs: Add bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d71f2351-af17-7e20-7f99-d628b7ab5765@bytedance.com>
Date:   Thu, 26 May 2022 10:40:35 +0800
From:   Feng Zhou <zhoufeng.zf@...edance.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        Xiongchun Duan <duanxiongchun@...edance.com>,
        Muchun Song <songmuchun@...edance.com>,
        Dongdong Wang <wangdongdong.6@...edance.com>,
        Cong Wang <cong.wang@...edance.com>,
        Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [External] Re: [PATCH v2 2/2] selftest/bpf/benchs: Add bpf_map
 benchmark

在 2022/5/25 上午8:13, Alexei Starovoitov 写道:
> On Tue, May 24, 2022 at 12:53 AM Feng zhou <zhoufeng.zf@...edance.com> wrote:
>> +static void setup(void)
>> +{
>> +       struct bpf_link *link;
>> +       int map_fd, i, max_entries;
>> +
>> +       setup_libbpf();
>> +
>> +       ctx.skel = bpf_map_bench__open_and_load();
>> +       if (!ctx.skel) {
>> +               fprintf(stderr, "failed to open skeleton\n");
>> +               exit(1);
>> +       }
>> +
>> +       link = bpf_program__attach(ctx.skel->progs.benchmark);
>> +       if (!link) {
>> +               fprintf(stderr, "failed to attach program!\n");
>> +               exit(1);
>> +       }
>> +
>> +       //fill hash_map
>> +       map_fd = bpf_map__fd(ctx.skel->maps.hash_map_bench);
>> +       max_entries = bpf_map__max_entries(ctx.skel->maps.hash_map_bench);
>> +       for (i = 0; i < max_entries; i++)
>> +               bpf_map_update_elem(map_fd, &i, &i, BPF_ANY);
>> +}
> ...
>   +SEC("fentry/" SYS_PREFIX "sys_getpgid")
>> +int benchmark(void *ctx)
>> +{
>> +       u32 key = bpf_get_prandom_u32();
>> +       u64 init_val = 1;
>> +
>> +       bpf_map_update_elem(&hash_map_bench, &key, &init_val, BPF_ANY);
>> +       return 0;
>> +}
> This benchmark is artificial at its extreme.
> First it populates the map till max_entries and then
> constantly bounces off the max_entries limit in a bpf prog.
> Sometimes random_u32 will be less than max_entries
> and map_update_elem will hit a fast path,
> but most of the time it will fail to alloc_htab_elem()
> and will fail to map_update_elem.
>
> It does demonstrate that percpu_free_list is inefficient
> when it's empty, but there is no way such a microbenchmark
> justifies optimizing this corner case.
>
> If there is a production use case please code it up in
> a benchmark.

This corner case is not easy to reproduce. In the scenario of a surge in 
network traffic,
the map is full, and there are still a large number of update operations.
Just like Yonghong Song says
'''
in your use case, you have lots of different keys and your intention is 
NOT to capture all the keys in
the hash table. So given a hash table, it is possible that the hash
will become full even if you increase the hashtable size.
Maybe you will occasionally delete some keys which will free some
space but the space will be quickly occupied by the new updates.
'''

>
> Also there is a lot of other overhead: syscall and atomic-s.
> To stress map_update_elem please use a for() loop inside bpf prog.

Ok, I will modify the way the test case is tested.
And add this benchmark just to reproduce the case. As for whether to 
optimize this case, I use ftrace
'''
cd /sys/kernel/debug/tracing/
echo > trace
echo htab_map_update_elem > set_graph_function
echo function_graph > current_tracer
cat per_cpu/cpu0/trace
echo nop > current_tracer
'''
To confirm whether the update operation will continue to grab the 
spin-lock of each cpu when the map is full.
Then close ftrace, check the time-consuming update and whether there is 
any improvement before patching.