lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220304232832.764156-1-namhyung@kernel.org>
Date:   Fri,  4 Mar 2022 15:28:32 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>
Cc:     Martin KaFai Lau <kafai@...com>, Song Liu <songliubraving@...com>,
        Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, Eugene Loh <eugene.loh@...cle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Hao Luo <haoluo@...gle.com>
Subject: [RFC] A couple of issues on BPF callstack

Hello,

While I'm working on lock contention tracepoints [1] for a future BPF
use, I found some issues on the stack trace in BPF programs.  Maybe
there are things that I missed but I'd like to share my thoughts for
your feedback.  So please correct me if I'm wrong.

The first thing I found is how it handles skipped frames in the
bpf_get_stack{,id}.  Initially I wanted a short stack trace like 4
depth to identify callers quickly, but it turned out that 4 is not
enough and it's all filled with the BPF code itself.

So I set to skip 4 frames but it always returns an error (-EFAULT).
After some time I figured out that BPF doesn't allow to set skip
frames greater than or equal to buffer size.  This seems strange and
looks like a bug.  Then I found a bug report (and a partial fix) [2]
and work on a full fix now.

But it revealed another problem with BPF programs on perf_event which
use a variant of stack trace functions.  The difference is that it
needs to use a callchain in the perf sample data.  The perf callchain
is saved from the begining while BPF callchain is saved at the last to
limit the stack depth by the buffer size.  But I can handle that.

More important thing to me is the content of the (perf) callchain.  If
the event has __PERF_SAMPLE_CALLCHAIN_EARLY, it will have context info
like PERF_CONTEXT_KERNEL.  So user might or might not see it depending
on whether the perf_event set with precise_ip and SAMPLE_CALLCHAIN.
This doesn't look good.

After all, I think it'd be really great if we can skip those
uninteresting info easily.  Maybe we could add a flag to skip BPF code
perf context, and even some scheduler code from the trace respectively
like in stack_trace_consume_entry_nosched().

Thoughts?

Thanks,
Namhyung


[1] https://lore.kernel.org/all/20220301010412.431299-1-namhyung@kernel.org/
[2] https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ