[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210923205105.zufadghli5772uma@ast-mbp>
Date: Thu, 23 Sep 2021 13:51:05 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Dave Marchevsky <davemarchevsky@...com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Yonghong Song <yhs@...com>
Subject: Re: [RFC PATCH bpf-next 0/2] bpf: keep track of prog verification
stats
On Mon, Sep 20, 2021 at 08:11:10AM -0700, Dave Marchevsky wrote:
> The verifier currently logs some useful statistics in
> print_verification_stats. Although the text log is an effective feedback
> tool for an engineer iterating on a single application, it would also be
> useful to enable tracking these stats in a more structured form for
> fleetwide or historical analysis, which this patchset attempts to do.
>
> A concrete motivating usecase which came up in recent weeks:
>
> A team owns a complex BPF program, with various folks extending its
> functionality over the years. An engineer tries to make a relatively
> simple addition but encounters "BPF program is too large. Processed
> 1000001 insn".
>
> Their changes bumped the processed insns from 700k to over the limit and
> there's no obvious way to simplify. They must now consider a large
> refactor in order to incorporate the new feature. What if there was some
> previous change which bumped processed insns from 200k->700k which
> _could_ be modified to stress verifier less? Tracking historical
> verifier stats for each version of the program over the years would
> reduce manual work necessary to find such a change.
>
>
> Although parsing the text log could work for this scenario, a solution
> that's resilient to log format and other verifier changes would be
> preferable.
>
> This patchset adds a bpf_prog_verif_stats struct - containing the same
> data logged by print_verification_stats - which can be retrieved as part
> of bpf_prog_info. Looking for general feedback on approach and a few
> specific areas before fleshing it out further:
>
> * None of my usecases require storing verif_stats for the lifetime of a
> loaded prog, but adding to bpf_prog_aux felt more correct than trying
> to pass verif_stats back as part of BPF_PROG_LOAD
> * The verif_stats are probably not generally useful enough to warrant
> inclusion in fdinfo, but hoping to get confirmation before removing
> that change in patch 1
> * processed_insn, verification_time, and total_states are immediately
> useful for me, rest were added for parity with
> print_verification_stats. Can remove.
> * Perhaps a version field would be useful in verif_stats in case future
> verifier changes make some current stats meaningless
> * Note: stack_depth stat was intentionally skipped to keep patch 1
> simple. Will add if approach looks good.
Sorry for the delay. LPC consumes a lot of mental energy :)
I see the value of exposing some of the verification stats as prog_info.
Let's look at the list:
struct bpf_prog_verif_stats {
__u64 verification_time;
__u32 insn_processed;
__u32 max_states_per_insn;
__u32 total_states;
__u32 peak_states;
__u32 longest_mark_read_walk;
};
verification_time is non deterministic. It varies with frequency
and run-to-run. I don't see how alerting tools can use it.
insn_processed is indeed the main verification metric.
By now it's well known and understood.
max_states_per_insn, total_states, etc were the metrics I've studied
carefully with pruning, back tracking and pretty much every significant
change I did or reiviewed in the verifier. They're useful to humans
and developers, but I don't see how alerting tools will use them.
So it feels to me that insn_processed alone will be enough to address the
monitoring goal.
It can be exposed to fd_info and printed by bpftool.
If/when it changes with some future verifier algorithm we should be able
to approximate it.
Powered by blists - more mailing lists