[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38f99862-e5f4-0688-b5ef-43fa6584b823@fb.com>
Date: Sat, 5 Mar 2022 16:28:17 -0800
From: Yonghong Song <yhs@...com>
To: Namhyung Kim <namhyung@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>
Cc: Martin KaFai Lau <kafai@...com>, Song Liu <songliubraving@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, netdev@...r.kernel.org,
bpf@...r.kernel.org, Eugene Loh <eugene.loh@...cle.com>,
Peter Zijlstra <peterz@...radead.org>,
Hao Luo <haoluo@...gle.com>
Subject: Re: [RFC] A couple of issues on BPF callstack
On 3/4/22 3:28 PM, Namhyung Kim wrote:
> Hello,
>
> While I'm working on lock contention tracepoints [1] for a future BPF
> use, I found some issues on the stack trace in BPF programs. Maybe
> there are things that I missed but I'd like to share my thoughts for
> your feedback. So please correct me if I'm wrong.
>
> The first thing I found is how it handles skipped frames in the
> bpf_get_stack{,id}. Initially I wanted a short stack trace like 4
> depth to identify callers quickly, but it turned out that 4 is not
> enough and it's all filled with the BPF code itself.
>
> So I set to skip 4 frames but it always returns an error (-EFAULT).
> After some time I figured out that BPF doesn't allow to set skip
> frames greater than or equal to buffer size. This seems strange and
> looks like a bug. Then I found a bug report (and a partial fix) [2]
> and work on a full fix now.
Thanks for volunteering. Looking forward to the patch.
>
> But it revealed another problem with BPF programs on perf_event which
> use a variant of stack trace functions. The difference is that it
> needs to use a callchain in the perf sample data. The perf callchain
> is saved from the begining while BPF callchain is saved at the last to
> limit the stack depth by the buffer size. But I can handle that.
>
> More important thing to me is the content of the (perf) callchain. If
> the event has __PERF_SAMPLE_CALLCHAIN_EARLY, it will have context info
> like PERF_CONTEXT_KERNEL. So user might or might not see it depending
> on whether the perf_event set with precise_ip and SAMPLE_CALLCHAIN.
> This doesn't look good.
Patch 7b04d6d60fcf ("bpf: Separate bpf_get_[stack|stackid] for
perf events BPF") tried to fix __PERF_SAMPLE_CALLCHAIN_EARLY issue
for bpf_get_stack[id]() helpers.
The helpers will check whether event->attr.sample_type has
__PERF_SAMPLE_CALLCHAIN_EARLY encoded or not, based on which
the stacks will be retrieved accordingly.
Did you any issue here?
>
> After all, I think it'd be really great if we can skip those
> uninteresting info easily. Maybe we could add a flag to skip BPF code
We cannot just skip those callchains with __PERF_SAMPLE_CALLCHAIN_EARLY.
There are real use cases for it.
> perf context, and even some scheduler code from the trace respectively
> like in stack_trace_consume_entry_nosched().
A flag for the bpf_get_stack[id]() helpers? It is possible. It would be
great if you can detail your use case here and how a flag could help
you.
>
> Thoughts?
>
> Thanks,
> Namhyung
>
>
> [1] https://lore.kernel.org/all/20220301010412.431299-1-namhyung@kernel.org/
> [2] https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com/
Powered by blists - more mailing lists