lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Apr 2018 08:12:42 -0600
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Gianluca Borello <g.borello@...il.com>
Cc:     netdev@...r.kernel.org, daniel@...earbox.net, ast@...nel.org
Subject: Re: [RFC bpf] bpf, x64: fix JIT emission for dead code

On Wed, Apr 25, 2018 at 05:42:16AM +0000, Gianluca Borello wrote:
> Commit 2a5418a13fcf ("bpf: improve dead code sanitizing") replaced dead
> code with a series of ja-1 instructions, for safety. That made JIT
> compilation much more complex for some BPF programs. One instance of such
> programs is, for example:
> 
> bool flag = false
> ...
> /* A bunch of other code */
> ...
> if (flag)
>         do_something()
> 
> In some cases llvm is not able to remove at compile time the code for
> do_something(), so the generated BPF program ends up with a large amount
> of dead instructions. In one specific real life example, there are two
> series of ~500 and ~1000 dead instructions in the program. When the
> verifier replaces them with a series of ja-1 instructions, it causes an
> interesting behavior at JIT time.
> 
> During the first pass, since all the instructions are estimated at 64
> bytes, the ja-1 instructions end up being translated as 5 bytes JMP
> instructions (0xE9), since the jump offsets become increasingly large (>
> 127) as each instruction gets discovered to be 5 bytes instead of the
> estimated 64.
> 
> Starting from the second pass, the first N instructions of the ja-1
> sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets
> become <= 127 this time. In particular, N is defined as roughly 127 / (5
> - 2) ~= 42. So, each further pass will make the subsequent N JMP
> instructions shrink from 5 to 2 bytes, making the image shrink every time.
> This means that in order to have the entire program converge, there need
> to be, in the real example above, at least ~1000 / 42 ~= 24 passes just
> for translating the dead code. If we add this number to the passes needed
> to translate the other non dead code, it brings such program to 40+
> passes, and JIT doesn't complete. Ultimately the userspace loader fails
> because such BPF program was supposed to be part of a prog array owner
> being JITed.
> 
> While it is certainly possible to try to refactor such programs to help
> the compiler remove dead code, the behavior is not really intuitive and it
> puts further burden on the BPF developer who is not expecting such
> behavior. To make things worse, such programs are working just fine in all
> the kernel releases prior to the ja-1 fix.
> 
> A possible approach to mitigate this behavior consists into noticing that
> for ja-1 instructions we don't really need to rely on the estimated size
> of the previous and current instructions, we know that a -1 BPF jump
> offset can be safely translated into a 0xEB instruction with a jump offset
> of -2.
> 
> Such fix brings the BPF program in the previous example to complete again
> in ~9 passes.
> 
> Fixes: 2a5418a13fcf ("bpf: improve dead code sanitizing")
> Signed-off-by: Gianluca Borello <g.borello@...il.com>
> ---
> Hi
> 
> Posting this as RFC since I just wanted to report this potential bug and
> propose a possible fix, although I'm not sure if:
> 
> 1) There might be a better fix on the JIT side
> 2) We might want to replace the ja-1 instructions with something else
> 3) We might want to ignore this issue
> 
> If we choose option 3, I'd just like to point out that this caused a real
> regression on a couple BPF programs that are part of a larger collection
> of programs that used to work fine on 4.15 and I recently found broken 
> in 4.16, so there would be some value in somehow addressing this.
> 
> Thanks
> 
>  arch/x86/net/bpf_jit_comp.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index b725154182cc..abce27ceb411 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1027,7 +1027,17 @@ xadd:			if (is_imm8(insn->off))
>  			break;
>  
>  		case BPF_JMP | BPF_JA:
> -			jmp_offset = addrs[i + insn->off] - addrs[i];
> +			if (insn->off == -1)
> +				/* -1 jmp instructions will always jump
> +				 * backwards two bytes. Explicitly handling
> +				 * this case avoids wasting too many passes
> +				 * when there are long sequences of replaced
> +				 * dead code.
> +				 */
> +				jmp_offset = -2;
> +			else
> +				jmp_offset = addrs[i + insn->off] - addrs[i];
> +

Impressive analysis and fix. Thank you Gianluca.
Acked-by: Alexei Starovoitov <ast@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ