linux-kernel - Re: [RFC bpf-next 0/2] sharing bpf runtime stats with /dev/bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200314024322.vymr6qkxsf6nzpum@ast-mbp.dhcp.thefacebook.com>
Date:   Fri, 13 Mar 2020 19:43:22 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Song Liu <songliubraving@...com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org, kernel-team@...com,
        ast@...nel.org, daniel@...earbox.net, mcgrof@...nel.org,
        keescook@...omium.org, yzaikin@...gle.com, peterz@...radead.org,
        bristot@...hat.com, mingo@...nel.org
Subject: Re: [RFC bpf-next 0/2] sharing bpf runtime stats with /dev/bpf_stats

On Fri, Mar 13, 2020 at 05:35:16PM -0700, Song Liu wrote:
> Motivation (copied from 2/2):
> 
> ======================= 8< =======================
> Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats.
> Typical userspace tools use kernel.bpf_stats_enabled as follows:
> 
>   1. Enable kernel.bpf_stats_enabled;
>   2. Check program run_time_ns;
>   3. Sleep for the monitoring period;
>   4. Check program run_time_ns again, calculate the difference;
>   5. Disable kernel.bpf_stats_enabled.
> 
> The problem with this approach is that only one userspace tool can toggle
> this sysctl. If multiple tools toggle the sysctl at the same time, the
> measurement may be inaccurate.
> 
> To fix this problem while keep backward compatibility, introduce
> /dev/bpf_stats. sysctl kernel.bpf_stats_enabled will only change the
> lowest bit of the static key. /dev/bpf_stats, on the other hand, adds 2
> to the static key for each open fd. The runtime stats is enabled when
> kernel.bpf_stats_enabled == 1 or there is open fd to /dev/bpf_stats.
> 
> With /dev/bpf_stats, user space tool would have the following flow:
> 
>   1. Open a fd to /dev/bpf_stats;
>   2. Check program run_time_ns;
>   3. Sleep for the monitoring period;
>   4. Check program run_time_ns again, calculate the difference;
>   5. Close the fd.
> ======================= 8< =======================
> 
> 1/2 adds a few new API to jump_label.
> 2/2 adds the /dev/bpf_stats and adjust kernel.bpf_stats_enabled handler.
> 
> Please share your comments.

Conceptually makes sense to me. Few comments:
1. I don't understand why +2 logic is necessary.
Just do +1 for every FD and change proc_do_static_key() from doing
explicit enable/disable to do +1/-1 as well on transition from 0->1 and 1->0.
The handler would need to check that 1->1 and 0->0 is a nop.

2. /dev is kinda awkward. May be introduce a new bpf command that returns fd?

3. Instead of 1 and 2 tweak sysctl to do ++/-- unconditionally?
 Like repeated sysctl kernel.bpf_stats_enabled=1 will keep incrementing it
 and would need equal amount of sysctl kernel.bpf_stats_enabled=0 to get
 it back to zero where it will stay zero even if users keep spamming
 sysctl kernel.bpf_stats_enabled=0.
 This way current services that use sysctl will keep working as-is.
 Multiple services that currently collide on sysctl will magically start
 working without any changes to them. It is still backwards compatible.