netdev - Re: [PATCH v2 bpf-next 0/3] bpf/verifier: subprog/func

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180403010802.jkqffxw4m75oioj7@ast-mbp>
Date:   Mon, 2 Apr 2018 18:08:04 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Edward Cree <ecree@...arflare.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org
Subject: Re: [PATCH v2 bpf-next 0/3] bpf/verifier: subprog/func_call
 simplifications

On Thu, Mar 29, 2018 at 11:44:17PM +0100, Edward Cree wrote:
> By storing subprog boundaries as a subprogno mark on each insn, rather than
>  a start (and implicit end) for each subprog, we collect a number of gains:
> * More efficient determination of which subprog contains a given insn, and
>   thus of find_subprog (which subprog begins at a given insn).
> * Number of verifier "full recursive walk" passes is reduced, since most of
>   the work is done in the main insn walk (do_check()).  Leftover work in
>   other passes is mostly linear scans (O(insn_cnt)) or, in the case of
>   check_max_stack_depth(), a topological sort (O(subprog_cnt)).
> 
> Some other changes were also included to support this:
> * Per-subprog info is stored in env->subprog_info, an array of structs,
>   rather than several arrays with a common index.
> * Call graph is now stored in the new bpf_subprog_info struct; used here
>   for check_max_stack_depth() but may have other uses too.
> 
> Along with this, patch #3 puts parent pointers (used by liveness analysis)
>  in the registers instead of the func_state or verifier_state, so that we
>  don't need skip_callee() machinery.  This also does the right thing for
>  stack slots, so they don't need their own special handling for liveness
>  marking either.

I like patch 3 and going to play with it.
How did you test it ? Do you have processed_insn numbers for
cilium or selftests programs before and after?
There should be no difference, right?
Essentially it's trading extra run-time cost of skip_callee logic
for higher memory overhead due to parent pointers in every register
state. I guess that's ok, since it does cleanup things nicely.

As far as patch 1 it was very difficult to review, since several logically
different things clamped together. So breaking it apart:
- converting two arrays of subprog_starts and subprog_stack_depth into
  single array of struct bpf_subprog_info is a good thing
- tsort is interesting, but not sure it's correct. more below
- but main change of combining subprog logic with do_check is no good.
The verifier have to move towards compiler style code analysis instead of
the other way around. It has to analyze the program in simple and
hopefully easy to understand passes instead of combinging everything
into one loop. We _have_ to get rid of do_check() approach and
corresponding insn_processed counter. That was and still is
the biggest and most pressing issue we need to solve in bpf verification.
The verifier has to switch to compiler style code analysis to scale.
The algorithms should be such that scale with thousands of instructions
and thousands of functions. The only way I see reaching that goal
is to do hierarchical program analysis in passes:
- identify subrpograms
- identify basic blocks, build control flow graph
- build dom graph, ssa and so on
- and do per-basic block liveness and data flow analysis that propagates
  the combined result further into other BBs along cfg edges.
There will be no 'do_check() across whole program' walk.
Combining subprog pass with do_check is going into opposite direction
of this long term work. Divide and conquer. Combining more things into
do_check is the opposite of this programming principle.
My short term plan is to split basic instruction correctness checks
out of do_check() loop into separate pass and run it early on.
It's necessary for bpf libraries to work, since the verifier cannot do
full data flow analysis at this point, but should build cfg and move
as much as possible out of instruction by instruction walk to scale
with number of bpf programs/libraries and sizes of them.

As far as tsort approach for determining max stack depth...
It's an interesting idea, but implementation is suffering from the same
'combine everything into one loop' coding issue.
I think updating total_stack_depth math should be separate from sorting,
since the same function can be part of different stacks with different max.
I don't see how updating global subprog_info[i].total_stack_depth can
be correct. It has to be different for every stack and the longest
stack is not necessarily the deepest. May be I'm missing something,
but combined sort and stack_depth math didn't make it easy to review.
I find existing check_max_stack_depth() algo much easier to understand.