netdev - Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180830021837.smtakvn62ieastst@ast-mbp>
Date:   Wed, 29 Aug 2018 19:18:39 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Edward Cree <ecree@...arflare.com>
Cc:     ast@...nel.org, daniel@...earbox.net, netdev@...r.kernel.org
Subject: Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

On Wed, Aug 22, 2018 at 08:00:46PM +0100, Edward Cree wrote:
> The first patch is a simplification of register liveness tracking by using
>  a separate parentage chain for each register and stack slot, thus avoiding
>  the need for logic to handle callee-saved registers when applying read
>  marks.  In the future this idea may be extended to form use-def chains.
> The second patch adds information about misc/zero data on the stack to the
>  state dumps emitted to the log at various points; this information was
>  found essential in debugging the first patch, and may be useful elsewhere.

I think this set is a great improvement in liveness tracking,
so depsite seeing the issues I applied it to bpf-next.

I think it's a better base to continue debugging.
In particular:
1. we have instability issue in the verifier.
 from time to time the verifier goes to process extra 7 instructions on one
 of the cilium tests. This was happening before and after this set.
2. there is a nice improvement in number of processed insns with this set,
 but the difference I cannot explain, hence it has to debugged.
 In theory the liveness rewrite shouldn't cause the difference in processed insns.

If not for the issue 1 I would argue that the issue 2 means that the set has to
be debugged before going in, but since the verifier is unstable it's better
to debug from this base with this patch set applied (because it greatly
simplifies liveness and adds additional debug in patch2)
and once we figure out the issue 1, I hope, the issue 2 will be resolved automatically.

The numbers on cilium bpf programs:
                      before1    before2   after1  after2
bpf_lb-DLB_L3.o 	2003      2003      2003    2003
bpf_lb-DLB_L4.o 	3173      3173      3173    3173
bpf_lb-DUNKNOWN.o 	1080      1080      1080    1080
bpf_lxc-DDROP_ALL.o	29587     29587     29587   29587
bpf_lxc-DUNKNOWN.o	37204     37211     36926   36933
bpf_netdev.o		11283     11283     11188   11188
bpf_overlay.o		6679      6679      6679    6679
bpf_lcx_jit.o		39657     39657     39561   39561

notice how bpf_lxc-DUNKNOWN.o fluctuates with +7 before and after. That is issue 1.
bpf_lxc-DUNKNOWN.o, bpf_netdev.o, and bpf_lcx_jit.o have small improvements.
That is issue 2.

To reproduce above numbers clone this repo: https://github.com/4ast/bpf_cilium_test
and run .sh. The .o files in there are pretty old cilium bpf programs.
I kept them frozen and didn't recompile for more than a year to keep stable
base line and track the progress of the verifier in 'processed insns'.

Thanks