netdev - Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <578E0C61.6070705@iogearbox.net>
Date:	Tue, 19 Jul 2016 13:17:53 +0200
From:	Daniel Borkmann <daniel@...earbox.net>
To:	Sargun Dhillon <sargun@...gun.me>
CC:	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	alexei.starovoitov@...il.com
Subject: Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper
 to be called in tracers (kprobes)

Hi Sargun,

On 07/19/2016 11:32 AM, Sargun Dhillon wrote:
> This allows user memory to be written to during the course of a kprobe.
> It shouldn't be used to implement any kind of security mechanism
> because of TOC-TOU attacks, but rather to debug, divert, and
> manipulate execution of semi-cooperative processes.
>
> Although it uses probe_kernel_write, we limit the address space
> the probe can write into by checking the space with access_ok.
> This call shouldn't sleep on any architectures based on review.
>
> It was tested with the tracex7 program on x86-64.
>
> Signed-off-by: Sargun Dhillon <sargun@...gun.me>
> Cc: Alexei Starovoitov <ast@...nel.org>
[...]
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ebfbb7d..45878f3 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -81,6 +81,35 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
>   	.arg3_type	= ARG_ANYTHING,
>   };
>
> +static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
> +{
> +	void *to = (void *) (long) r1;
> +	void *from = (void *) (long) r2;
> +	int  size = (int) r3;

Nit: you have two spaces here?

> +	struct task_struct *task = current;
> +
> +	/* check if we're in a user context */
> +	if (unlikely(in_interrupt()))
> +		return -EINVAL;
> +	if (unlikely(!task || !task->pid))

Why !task->pid is needed? Can you point to the code where this is
set up as such? I'd assume if that's the case, there's a generic
helper for it to determine that the task_struct is a kernel task?

> +		return -EINVAL;
> +
> +	/* Is this a user address, or a kernel address? */
> +	if (!access_ok(VERIFY_WRITE, to, size))
> +		return -EINVAL;
> +
> +	return probe_kernel_write(to, from, size);

I'm still worried that this can lead to all kind of hard to find
bugs or races for user processes, if you make this writable to entire
user address space (which is the only thing that access_ok() checks
for). What if the BPF program has bugs and writes to wrong addresses
for example, introducing bugs in some other, non-intended processes
elsewhere? Can this be limited to syscalls only? And if so, to the
passed memory only?

Have you played around with ptrace() to check whether you could
achieve similar functionality (was thinking about things like [1],
PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
this be limited to a similar functionality for only the current task.
ptrace() utilizes helpers like access_process_vm(), maybe this can
similarly be adapted here, too (under the circumstances that sleeping
is not allowed)?

   [1] https://code.google.com/archive/p/ptrace-parasite/

> +}
> +
> +static const struct bpf_func_proto bpf_copy_to_user_proto = {
> +	.func = bpf_copy_to_user,
> +	.gpl_only = false,
> +	.ret_type = RET_INTEGER,
> +	.arg1_type = ARG_ANYTHING,
> +	.arg2_type = ARG_PTR_TO_RAW_STACK,
> +	.arg3_type = ARG_CONST_STACK_SIZE,

Nit: please tab-align '=' like we have in the other *_proto cases in
bpf_trace.

> +};
> +
>   /*
>    * limited trace_printk()
>    * only %d %u %x %ld %lu %lx %lld %llu %llx %p %s conversion specifiers allowed

Thanks,
Daniel