linux-kernel - Re: [PATCH v2 bpf-next 2/4] bpf: introduce helper bpf_get_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AD7AE0B3-94F9-4430-990C-85B9CF431EC7@fb.com>
Date:   Fri, 26 Jun 2020 23:47:37 +0000
From:   Song Liu <songliubraving@...com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
CC:     bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        Peter Ziljstra <peterz@...radead.org>,
        "Alexei Starovoitov" <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "Kernel Team" <Kernel-team@...com>,
        john fastabend <john.fastabend@...il.com>,
        "KP Singh" <kpsingh@...omium.org>
Subject: Re: [PATCH v2 bpf-next 2/4] bpf: introduce helper bpf_get_task_stak()



> On Jun 26, 2020, at 3:51 PM, Andrii Nakryiko <andrii.nakryiko@...il.com> wrote:
> 
> On Fri, Jun 26, 2020 at 3:45 PM Song Liu <songliubraving@...com> wrote:
>> 
>> 
>> 
>>> On Jun 26, 2020, at 1:17 PM, Andrii Nakryiko <andrii.nakryiko@...il.com> wrote:
>>> 
>>> On Thu, Jun 25, 2020 at 5:14 PM Song Liu <songliubraving@...com> wrote:
>>>> 
>>>> Introduce helper bpf_get_task_stack(), which dumps stack trace of given
>>>> task. This is different to bpf_get_stack(), which gets stack track of
>>>> current task. One potential use case of bpf_get_task_stack() is to call
>>>> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.
>>>> 
>>>> bpf_get_task_stack() uses stack_trace_save_tsk() instead of
>>>> get_perf_callchain() for kernel stack. The benefit of this choice is that
>>>> stack_trace_save_tsk() doesn't require changes in arch/. The downside of
>>>> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
>>>> stack trace to unsigned long array. For 32-bit systems, we need to
>>>> translate it to u64 array.
>>>> 
>>>> Signed-off-by: Song Liu <songliubraving@...com>
>>>> ---
>>> 
>>> Looks great, I just think that there are cases where user doesn't
>>> necessarily has valid task_struct pointer, just pid, so would be nice
>>> to not artificially restrict such cases by having extra helper.
>>> 
>>> Acked-by: Andrii Nakryiko <andriin@...com>
>> 
>> Thanks!
>> 
>>> 
>>>> include/linux/bpf.h            |  1 +
>>>> include/uapi/linux/bpf.h       | 35 ++++++++++++++-
>>>> kernel/bpf/stackmap.c          | 79 ++++++++++++++++++++++++++++++++--
>>>> kernel/trace/bpf_trace.c       |  2 +
>>>> scripts/bpf_helpers_doc.py     |  2 +
>>>> tools/include/uapi/linux/bpf.h | 35 ++++++++++++++-
>>>> 6 files changed, 149 insertions(+), 5 deletions(-)
>>>> 
>>> 
>>> [...]
>>> 
>>>> +       /* stack_trace_save_tsk() works on unsigned long array, while
>>>> +        * perf_callchain_entry uses u64 array. For 32-bit systems, it is
>>>> +        * necessary to fix this mismatch.
>>>> +        */
>>>> +       if (__BITS_PER_LONG != 64) {
>>>> +               unsigned long *from = (unsigned long *) entry->ip;
>>>> +               u64 *to = entry->ip;
>>>> +               int i;
>>>> +
>>>> +               /* copy data from the end to avoid using extra buffer */
>>>> +               for (i = entry->nr - 1; i >= (int)init_nr; i--)
>>>> +                       to[i] = (u64)(from[i]);
>>> 
>>> doing this forward would be just fine as well, no? First iteration
>>> will cast and overwrite low 32-bits, all the subsequent iterations
>>> won't even overlap.
>> 
>> I think first iteration will write zeros to higher 32 bits, no?
> 
> Oh, wait, I completely misread what this is doing. It up-converts from
> 32-bit to 64-bit, sorry. Yeah, ignore me on this :)
> 
> But then I have another question. How do you know that entry->ip has
> enough space to keep the same number of 2x bigger entries?

The buffer is sized for sysctl_perf_event_max_stack u64 numbers. 
stack_trace_save_tsk() will put at most stack_trace_save_tsk unsigned 
long in it (init_nr == 0). So the buffer is big enough. 

Thanks,
Song