lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20190425221021.ov2jj4piann7wmid@ast-mbp.dhcp.thefacebook.com> Date: Thu, 25 Apr 2019 15:10:24 -0700 From: Alexei Starovoitov <alexei.starovoitov@...il.com> To: Jiong Wang <jiong.wang@...ronome.com> Cc: Alexei Starovoitov <ast@...nel.org>, daniel@...earbox.net, netdev@...r.kernel.org, bpf@...r.kernel.org, Jakub Kicinski <jakub.kicinski@...ronome.com>, "oss-drivers@...ronome.com" <oss-drivers@...ronome.com> Subject: Re: 32-bit zext time complexity (Was Re: [PATCH bpf-next] selftests/bpf: two scale tests) On Thu, Apr 25, 2019 at 08:25:44AM +0100, Jiong Wang wrote: > > Alexei Starovoitov writes: > > > On Thu, Apr 25, 2019 at 12:07:06AM +0100, Jiong Wang wrote: > >> > >> Alexei Starovoitov writes: > >> > >> > Add two tests to check that sequence of 1024 jumps is verifiable. > >> > > >> > Signed-off-by: Alexei Starovoitov <ast@...nel.org> > >> > --- > >> > tools/testing/selftests/bpf/test_verifier.c | 70 ++++++++++++++++++++ > >> > tools/testing/selftests/bpf/verifier/scale.c | 18 +++++ > >> > >> I am rebasing 32-bit opt pass on top of latest bpf-next and found these new > >> tests take more than 20 minutes to run and had not finished after that. > >> > >> The reason the following insn filling insde bpf_fill_scale1 is generating > >> nearly 1M insn whose results are recognized as safe to be poisoned. > >> > >> bpf_fill_scale1: > >> while (i < MAX_TEST_INSNS - 1025) > >> insn[i++] = BPF_ALU64_IMM(BPF_MOV, BPF_REG_0, 42); > >> > >> For each hi32 poisoning, there will be one call to "bpf_patch_insn_data" > >> which actually is not cheap (adjust jump insns, insn aux info etc). Now, > >> 1M call to it has exhausted server resources as described, 20minutes running > >> still not finished. > >> > >> For real world applications, we don't do hi32 poisoning, and there isn't much > >> lo32 zext. Benchmarking those bpf programs inside Cilium shows the final > >> zext pass adds about 8% ~ 15% verification time. > >> > >> The zext pass based on top of "bpf_patch_insn_data" looks more and more is > >> not the best approach to utilize the read32 analysis results. > >> > >> Previously, in v1 cover letter, I listed some of my other thoughts on how to > >> utilize the liveness analysis results: > >> > >> 1 Minor change on back-end JIT hook, also pass aux_insn information to > >> back-ends so they could have per insn information and they could do > >> zero extension for the marked insn themselves using the most > >> efficient native insn. > >> > >> 2 Introduce zero extension insn for eBPF. Then verifier could insert > >> the new zext insn instead of lshift + rshift. zext could be JITed > >> more efficiently. > >> > >> 3 Otherwise JIT back-ends need to do peephole to catch lshift + rshift > >> and turn them into native zext. > > > > all options sounds like hacks to workaround inefficient bpf_patch_insn_data(). > > Especially option 2 will work only because single insn is replaced > > with another insn ? > > Option 1 should be a generic solution. It is passing verifier analysis > results generated by insn walk down to JIT back-ends. The information > passed down could be any analysis result useful for JIT code-gen. > > > Let's fix the algo of bpf_patch_insn_data() instead, so that 1 insn -> 2+ insn > > is also fast. > > The issue with 1 insn -> 2+ insn should be calling of bpf_adj_branches > which is doing another for_each_insn_in_prog traversal, so the zext > insertion becomes something like: > > for_each_insn_in_prog > ... > if (zext) > ... > for_each_insn_in_prog > > which is quadratic. One solution is we chain all branch insns during > previous insn traversal in for example cfg check, and keep the information > somewhere info bpf_prog (env->insn_aux_data is a good place to keep such > information, but insn patch helpers are supposed to work with bpf_prog) > then bpf_adj_branches could traversal this chain instead of iterating > through all insns. I don't think it will make it much faster. There could be just as many jumps as there are instructions. Note that bpf_patch_insn_single() is calling bpf_adj_branches() twice too. And dead_code + convert_ctx + fixup_bpf_calls are calling bpf_patch_insn_single() a lot. How about before dead_code pass we convert the program into basic-block format, patch it all, and then convert from bb back to offsets. Patching will become very cheap, since no loop over program will be necessary. A jump from bb-N to bb-M will stay as-is regardless of amount of patching was done inside each bb. The loops inside these patching passes will be converted from: for (i = 0; i < insn_cnt; i++, insn++) into: for each bb for each insn in bb As far as option 1 "also pass aux_insn information to JITs"... in theory it's fine, but looks like big refactoring to existing code. So if you want to make this bb conversion as step 2 and unblock the current patch set faster I suggest to go with option 2 "Introduce zero extension insn".
Powered by blists - more mailing lists