netdev - Re: [RFC] A couple of issues on BPF callstack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <38f99862-e5f4-0688-b5ef-43fa6584b823@fb.com>
Date:   Sat, 5 Mar 2022 16:28:17 -0800
From:   Yonghong Song <yhs@...com>
To:     Namhyung Kim <namhyung@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>
Cc:     Martin KaFai Lau <kafai@...com>, Song Liu <songliubraving@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, Eugene Loh <eugene.loh@...cle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Hao Luo <haoluo@...gle.com>
Subject: Re: [RFC] A couple of issues on BPF callstack



On 3/4/22 3:28 PM, Namhyung Kim wrote:
> Hello,
> 
> While I'm working on lock contention tracepoints [1] for a future BPF
> use, I found some issues on the stack trace in BPF programs.  Maybe
> there are things that I missed but I'd like to share my thoughts for
> your feedback.  So please correct me if I'm wrong.
> 
> The first thing I found is how it handles skipped frames in the
> bpf_get_stack{,id}.  Initially I wanted a short stack trace like 4
> depth to identify callers quickly, but it turned out that 4 is not
> enough and it's all filled with the BPF code itself.
> 
> So I set to skip 4 frames but it always returns an error (-EFAULT).
> After some time I figured out that BPF doesn't allow to set skip
> frames greater than or equal to buffer size.  This seems strange and
> looks like a bug.  Then I found a bug report (and a partial fix) [2]
> and work on a full fix now.

Thanks for volunteering. Looking forward to the patch.

> 
> But it revealed another problem with BPF programs on perf_event which
> use a variant of stack trace functions.  The difference is that it
> needs to use a callchain in the perf sample data.  The perf callchain
> is saved from the begining while BPF callchain is saved at the last to
> limit the stack depth by the buffer size.  But I can handle that.
> 
> More important thing to me is the content of the (perf) callchain.  If
> the event has __PERF_SAMPLE_CALLCHAIN_EARLY, it will have context info
> like PERF_CONTEXT_KERNEL.  So user might or might not see it depending
> on whether the perf_event set with precise_ip and SAMPLE_CALLCHAIN.
> This doesn't look good.

Patch 7b04d6d60fcf ("bpf: Separate bpf_get_[stack|stackid] for
perf events BPF") tried to fix __PERF_SAMPLE_CALLCHAIN_EARLY issue
for bpf_get_stack[id]() helpers.
The helpers will check whether event->attr.sample_type has
__PERF_SAMPLE_CALLCHAIN_EARLY encoded or not, based on which
the stacks will be retrieved accordingly.
Did you any issue here?

> 
> After all, I think it'd be really great if we can skip those
> uninteresting info easily.  Maybe we could add a flag to skip BPF code

We cannot just skip those callchains with __PERF_SAMPLE_CALLCHAIN_EARLY.
There are real use cases for it.

> perf context, and even some scheduler code from the trace respectively
> like in stack_trace_consume_entry_nosched().

A flag for the bpf_get_stack[id]() helpers? It is possible. It would be
great if you can detail your use case here and how a flag could help
you.

> 
> Thoughts?
> 
> Thanks,
> Namhyung
> 
> 
> [1] https://lore.kernel.org/all/20220301010412.431299-1-namhyung@kernel.org/
> [2] https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com/