linux-kernel - Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <42DEE452-F411-4098-917B-11B23AC99F5F@fb.com>
Date:   Tue, 21 Jul 2020 22:40:19 +0000
From:   Song Liu <songliubraving@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:     open list <linux-kernel@...r.kernel.org>,
        bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "kpsingh@...omium.org" <kpsingh@...omium.org>,
        "brouer@...hat.com" <brouer@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>
Subject: Re: [PATCH v3 bpf-next 1/2] bpf: separate bpf_get_[stack|stackid] for
 perf events BPF



> On Jul 21, 2020, at 12:10 PM, Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> 
> On Thu, Jul 16, 2020 at 03:59:32PM -0700, Song Liu wrote:
>> +
>> +BPF_CALL_3(bpf_get_stackid_pe, struct bpf_perf_event_data_kern *, ctx,
>> +	   struct bpf_map *, map, u64, flags)
>> +{
>> +	struct perf_event *event = ctx->event;
>> +	struct perf_callchain_entry *trace;
>> +	bool has_kernel, has_user;
>> +	bool kernel, user;
>> +
>> +	/* perf_sample_data doesn't have callchain, use bpf_get_stackid */
>> +	if (!(event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
> 
> what if event was not created with PERF_SAMPLE_CALLCHAIN ?
> Calling the helper will still cause crashes, no?

Yeah, it may still crash. Somehow I messed up this logic...

> 
>> +		return bpf_get_stackid((unsigned long)(ctx->regs),
>> +				       (unsigned long) map, flags, 0, 0);
>> +
>> +	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
>> +			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
>> +		return -EINVAL;
>> +
>> +	user = flags & BPF_F_USER_STACK;
>> +	kernel = !user;
>> +
>> +	has_kernel = !event->attr.exclude_callchain_kernel;
>> +	has_user = !event->attr.exclude_callchain_user;
>> +
>> +	if ((kernel && !has_kernel) || (user && !has_user))
>> +		return -EINVAL;
> 
> this will break existing users in a way that will be very hard for them to debug.
> If they happen to set exclude_callchain_* flags during perf_event_open
> the helpers will be failing at run-time.
> One can argue that when precise_ip=1 the bpf_get_stack is broken, but
> this is a change in behavior.
> It also seems to be broken when PERF_SAMPLE_CALLCHAIN was not set at event
> creation time, but precise_ip=1 was.
> 
>> +
>> +	trace = ctx->data->callchain;
>> +	if (unlikely(!trace))
>> +		return -EFAULT;
>> +
>> +	if (has_kernel && has_user) {
> 
> shouldn't it be || ?

It should be &&. We only need to adjust the attached calltrace when it has both 
kernel and user stack. 

> 
>> +		__u64 nr_kernel = count_kernel_ip(trace);
>> +		int ret;
>> +
>> +		if (kernel) {
>> +			__u64 nr = trace->nr;
>> +
>> +			trace->nr = nr_kernel;
>> +			ret = __bpf_get_stackid(map, trace, flags);
>> +
>> +			/* restore nr */
>> +			trace->nr = nr;
>> +		} else { /* user */
>> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +
>> +			skip += nr_kernel;
>> +			if (skip > BPF_F_SKIP_FIELD_MASK)
>> +				return -EFAULT;
>> +
>> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +			ret = __bpf_get_stackid(map, trace, flags);
>> +		}
>> +		return ret;
>> +	}
>> +	return __bpf_get_stackid(map, trace, flags);
> ...
>> +	if (has_kernel && has_user) {
>> +		__u64 nr_kernel = count_kernel_ip(trace);
>> +		int ret;
>> +
>> +		if (kernel) {
>> +			__u64 nr = trace->nr;
>> +
>> +			trace->nr = nr_kernel;
>> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
>> +					      size, flags);
>> +
>> +			/* restore nr */
>> +			trace->nr = nr;
>> +		} else { /* user */
>> +			u64 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +
>> +			skip += nr_kernel;
>> +			if (skip > BPF_F_SKIP_FIELD_MASK)
>> +				goto clear;
>> +
>> +			flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
>> +			ret = __bpf_get_stack(ctx->regs, NULL, trace, buf,
>> +					      size, flags);
>> +		}
> 
> Looks like copy-paste. I think there should be a way to make it
> into common helper.

I thought about moving this logic to a helper. But we are calling
__bpf_get_stackid() above, and __bpf_get_stack() here. So we can't 
easily put all the logic in a big helper. Multiple small helpers 
looks messy (to me). 

> 
> I think the main isssue is wrong interaction with event attr flags.
> I think the verifier should detect that bpf_get_stack/bpf_get_stackid
> were used and prevent attaching to perf_event with attr.precise_ip=1
> and PERF_SAMPLE_CALLCHAIN is not specified.
> I was thinking whether attaching bpf to event can force setting of
> PERF_SAMPLE_CALLCHAIN, but that would be a surprising behavior,
> so not a good idea.
> So the only thing left is to reject attach when bpf_get_stack is used
> in two cases:
> if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is not set.
>  (since it will lead to crashes)

We only need to block precise_ip >= 2. precise_ip == 1 is OK. 

> if attr.precise_ip=1 and PERF_SAMPLE_CALLCHAIN is set,
> but exclude_callchain_[user|kernel]=1 is set too.
>  (since it will lead to surprising behavior of bpf_get_stack)
> 
> Other ideas?

Yes, this sounds good. 

Thanks,
Song