netdev - Re: [PATCH bpf-next] bpf: introduce BPF_JIT_ALWAYS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f0c1df2-3e27-a0ef-90b2-41ce499af14b@iogearbox.net>
Date:   Mon, 8 Jan 2018 22:59:04 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Alexei Starovoitov <ast@...nel.org>, davem@...emloft.net
Cc:     torvalds@...ux-foundation.org, jannh@...gle.com,
        alan@...ux.intel.com, netdev@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH bpf-next] bpf: introduce BPF_JIT_ALWAYS_ON config

On 01/08/2018 04:35 AM, Alexei Starovoitov wrote:
> The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
> 
> A quote from goolge project zero blog:
> "At this point, it would normally be necessary to locate gadgets in
> the host kernel code that can be used to actually leak data by reading
> from an attacker-controlled location, shifting and masking the result
> appropriately and then using the result of that as offset to an
> attacker-controlled address for a load. But piecing gadgets together
> and figuring out which ones work in a speculation context seems annoying.
> So instead, we decided to use the eBPF interpreter, which is built into
> the host kernel - while there is no legitimate way to invoke it from inside
> a VM, the presence of the code in the host kernel's text section is sufficient
> to make it usable for the attack, just like with ordinary ROP gadgets."
> 
> To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
> option that removes interpreter from the kernel in favor of JIT-only mode.
> So far eBPF JIT is supported by:
> x64, arm64, arm32, sparc64, s390, powerpc64, mips64
> 
> The start of JITed program is randomized and code page is marked as read-only.
> In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
> 
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> ---
>  init/Kconfig               | 7 +++++++
>  kernel/bpf/core.c          | 9 +++++++++
>  kernel/bpf/verifier.c      | 4 ++++
>  net/core/sysctl_net_core.c | 9 +++++++++
>  4 files changed, 29 insertions(+)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 2934249fba46..5e2a4a391ba9 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1392,6 +1392,13 @@ config BPF_SYSCALL
>  	  Enable the bpf() system call that allows to manipulate eBPF
>  	  programs and maps via file descriptors.
>  
> +config BPF_JIT_ALWAYS_ON
> +	bool "Permanently enable BPF JIT and remove BPF interpreter"
> +	depends on BPF_SYSCALL && HAVE_EBPF_JIT && BPF_JIT
> +	help
> +	  Enables BPF JIT and removes BPF interpreter to avoid
> +	  speculative execution of BPF instructions by the interpreter
> +
>  config USERFAULTFD
>  	bool "Enable userfaultfd() system call"
>  	select ANON_INODES
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 70a534549cd3..42756c434e0b 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -781,6 +781,7 @@ noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
>  }
>  EXPORT_SYMBOL_GPL(__bpf_call_base);
>  
> +#ifndef CONFIG_BPF_JIT_ALWAYS_ON
>  /**
>   *	__bpf_prog_run - run eBPF program on a given context
>   *	@ctx: is the data we are operating on
> @@ -1376,6 +1377,7 @@ void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
>  		__bpf_call_base_args;
>  	insn->code = BPF_JMP | BPF_CALL_ARGS;
>  }
> +#endif
>  
>  bool bpf_prog_array_compatible(struct bpf_array *array,
>  			       const struct bpf_prog *fp)
> @@ -1427,9 +1429,11 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
>   */
>  struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
>  {
> +#ifndef CONFIG_BPF_JIT_ALWAYS_ON
>  	u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
>  
>  	fp->bpf_func = interpreters[(round_up(stack_depth, 32) / 32) - 1];
> +#endif
>  
>  	/* eBPF JITs can rewrite the program in case constant
>  	 * blinding is active. However, in case of error during
> @@ -1453,6 +1457,11 @@ struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err)
>  	 */
>  	*err = bpf_check_tail_call(fp);
>  
> +#ifdef CONFIG_BPF_JIT_ALWAYS_ON
> +	if (!fp->jited)
> +		*err = -ENOTSUPP;
> +#endif

This part here and ...

>  	return fp;
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_select_runtime);
[...]
> @@ -524,6 +530,9 @@ static __net_initdata struct pernet_operations sysctl_core_ops = {
>  
>  static __init int sysctl_core_init(void)
>  {
> +#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_JIT_ALWAYS_ON)
> +	bpf_jit_enable = 1;
> +#endif

... this one will race and break stuff in the current shape, one example
is the PTP classifier in the tree: sysctl_core_init() is done in fs_initcall(),
whereas ptp_classifier_init() is done in sock_init() which is done out of
core_initcall().

So what will happen is that at this point in time bpf_jit_enable is not yet
set to 1, so when ptp_classifier_init() calls the cBPF bpf_prog_create(), it
will migrate the insns over to eBPF and in bpf_prog_select_runtime() called
from bpf_migrate_filter() have the assumption that we always succeed here
since when JIT fails, we will fall back to the interpreter anyway. The only
error up until now in bpf_prog_select_runtime() that could happen is out of
native eBPF prog load, so bpf_migrate_filter() will thus return just fine
and on first call to PTP classifier from a network packet, we'll get NULL
pointer deref since the fp->bpf_func is still NULL. So this would rather
need to be set much earlier on init or e.g. in the JITs themselves.

Other than that I was wondering whether the arm32 eBPF JIT could cause
trouble for cBPF as well, but it looks not the case since only alu64 div/mod
and xadd is not implemented there yet, so that should be ok since not used
in the migration.

>  	register_net_sysctl(&init_net, "net/core", net_core_table);
>  	return register_pernet_subsys(&sysctl_core_ops);
>  }
>