linux-kernel - Re: Getting empty callchain from perf_callchain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <842A0302-9B36-4FBF-ADF7-9C6749E8C5BE@fb.com>
Date:   Mon, 20 May 2019 17:22:12 +0000
From:   Song Liu <songliubraving@...com>
To:     Kairui Song <kasong@...hat.com>
CC:     Alexei Starovoitov <ast@...com>,
        Peter Zijlstra <peterz@...radead.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "Josh Poimboeuf" <jpoimboe@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        "Daniel Borkmann" <daniel@...earbox.net>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: Getting empty callchain from perf_callchain_kernel() 



> On May 19, 2019, at 11:07 AM, Kairui Song <kasong@...hat.com> wrote:
> 
> On Sat, May 18, 2019 at 5:48 AM Song Liu <songliubraving@...com> wrote:
>> 
>> 
>> 
>>> On May 17, 2019, at 2:06 PM, Alexei Starovoitov <ast@...com> wrote:
>>> 
>>> On 5/17/19 11:40 AM, Song Liu wrote:
>>>> +Alexei, Daniel, and bpf
>>>> 
>>>>> On May 17, 2019, at 2:10 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>>>>> 
>>>>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>>>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>>>>> some other bfp functions) is now broken, or, strating an unwind
>>>>>> directly inside a bpf program will end up strangely. It have following
>>>>>> kernel message:
>>>>> 
>>>>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>>>>> follow.
>>>> 
>>>> I guess we need something like the following? (we should be able to
>>>> optimize the PER_CPU stuff).
>>>> 
>>>> Thanks,
>>>> Song
>>>> 
>>>> 
>>>> diff --git i/kernel/trace/bpf_trace.c w/kernel/trace/bpf_trace.c
>>>> index f92d6ad5e080..c525149028a7 100644
>>>> --- i/kernel/trace/bpf_trace.c
>>>> +++ w/kernel/trace/bpf_trace.c
>>>> @@ -696,11 +696,13 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = {
>>>>        .arg5_type      = ARG_CONST_SIZE_OR_ZERO,
>>>> };
>>>> 
>>>> +static DEFINE_PER_CPU(struct pt_regs, bpf_stackid_tp_regs);
>>>> BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
>>>>           u64, flags)
>>>> {
>>>> -       struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>>>> +       struct pt_regs *regs = this_cpu_ptr(&bpf_stackid_tp_regs);
>>>> 
>>>> +       perf_fetch_caller_regs(regs);
>>> 
>>> No. pt_regs is already passed in. It's the first argument.
>>> If we call perf_fetch_caller_regs() again the stack trace will be wrong.
>>> bpf prog should not see itself, interpreter or all the frames in between.
>> 
>> Thanks Alexei! I get it now.
>> 
>> In bpf_get_stackid_tp(), the pt_regs is get by dereferencing the first field
>> of tp_buff:
>> 
>>        struct pt_regs *regs = *(struct pt_regs **)tp_buff;
>> 
>> tp_buff points to something like
>> 
>>        struct sched_switch_args {
>>                unsigned long long pad;
>>                char prev_comm[16];
>>                int prev_pid;
>>                int prev_prio;
>>                long long prev_state;
>>                char next_comm[16];
>>                int next_pid;
>>                int next_prio;
>>        };
>> 
>> where the first field "pad" is a pointer to pt_regs.
>> 
>> @Kairui, I think you confirmed that current code will give empty call trace
>> with ORC unwinder? If that's the case, can we add regs->ip back? (as in the
>> first email of this thread.
>> 
>> Thanks,
>> Song
>> 
> 
> Hi thanks for the suggestion, yes we can add it should be good an idea
> to always have IP when stack trace is not available.
> But stack trace is actually still broken, it will always give only one
> level of stacktrace (the IP).

I think this is still the best fix/workaround here? And only one level 
of stack trace should be OK for tracepoint? 

Thanks,
Song

> 
> -- 
> Best Regards,
> Kairui Song