[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o7k5fxwx.fsf@all.your.base.are.belong.to.us>
Date: Fri, 21 Jul 2023 10:53:34 +0200
From: Björn Töpel <bjorn@...nel.org>
To: Pu Lehui <pulehui@...wei.com>, Pu Lehui <pulehui@...weicloud.com>,
bpf@...r.kernel.org, linux-riscv@...ts.infradead.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Palmer Dabbelt <palmer@...belt.com>,
Guo Ren <guoren@...nel.org>, Song Shuai <suagrfillet@...il.com>
Subject: Re: [PATCH bpf] riscv, bpf: Adapt bpf trampoline to optimized riscv
ftrace framework
Pu Lehui <pulehui@...wei.com> writes:
> On 2023/7/19 23:18, Björn Töpel wrote:
>> Pu Lehui <pulehui@...wei.com> writes:
>>
>>> On 2023/7/19 4:06, Björn Töpel wrote:
>>>> Pu Lehui <pulehui@...weicloud.com> writes:
>>>>
>>>>> From: Pu Lehui <pulehui@...wei.com>
>>>>>
>>>>> Commit 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to
>>>>> half") optimizes the detour code size of kernel functions to half with
>>>>> T0 register and the upcoming DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv
>>>>> is based on this optimization, we need to adapt riscv bpf trampoline
>>>>> based on this. One thing to do is to reduce detour code size of bpf
>>>>> programs, and the second is to deal with the return address after the
>>>>> execution of bpf trampoline. Meanwhile, add more comments and rename
>>>>> some variables to make more sense. The related tests have passed.
>>>>>
>>>>> This adaptation needs to be merged before the upcoming
>>>>> DYNAMIC_FTRACE_WITH_DIRECT_CALLS of riscv, otherwise it will crash due
>>>>> to a mismatch in the return address. So we target this modification to
>>>>> bpf tree and add fixes tag for locating.
>>>>
>>>> Thank you for working on this!
>>>>
>>>>> Fixes: 6724a76cff85 ("riscv: ftrace: Reduce the detour code size to half")
>>>>
>>>> This is not a fix. Nothing is broken. Only that this patch much come
>>>> before or as part of the ftrace series.
>>>
>>> Yep, it's really not a fix. I have no idea whether this patch target to
>>> bpf-next tree can be ahead of the ftrace series of riscv tree?
>>
>> For this patch, I'd say it's easier to take it via the RISC-V tree, IFF
>> the ftrace series is in for-next.
>>
>
> alright, so let's make it target to riscv-tree to avoid that cracsh.
>
>> [...]
>>
>>>>> +#define DETOUR_NINSNS 2
>>>>
>>>> Better name? Maybe call this patchable function entry something? Also,
>>>
>>> How about RV_FENTRY_NINSNS?
>>
>> Sure. And more importantly that it's actually used in the places where
>> nops/skips are done.
>
> the new one is suited up.
>
>>
>>>> to catch future breaks like this -- would it make sense to have a
>>>> static_assert() combined with something tied to
>>>> -fpatchable-function-entry= from arch/riscv/Makefile?
>>>
>>> It is very necessary, but it doesn't seem to be easy. I try to find GCC
>>> related functions, something like __builtin_xxx, but I can't find it so
>>> far. Also try to make it as a CONFIG_PATCHABLE_FUNCTION_ENTRY=4 in
>>> Makefile and then static_assert, but obviously it shouldn't be done.
>>> Maybe we can deal with this later when we have a solution?
>>
>> Ok!
>>
>> [...]
>>
>>>>> @@ -787,20 +762,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
>>>>> int i, ret, offset;
>>>>> int *branches_off = NULL;
>>>>> int stack_size = 0, nregs = m->nr_args;
>>>>> - int retaddr_off, fp_off, retval_off, args_off;
>>>>> - int nregs_off, ip_off, run_ctx_off, sreg_off;
>>>>> + int fp_off, retval_off, args_off, nregs_off, ip_off, run_ctx_off, sreg_off;
>>>>> struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
>>>>> struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>>>>> struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
>>>>> void *orig_call = func_addr;
>>>>> - bool save_ret;
>>>>> + bool save_retval, traced_ret;
>>>>> u32 insn;
>>>>>
>>>>> /* Generated trampoline stack layout:
>>>>> *
>>>>> * FP - 8 [ RA of parent func ] return address of parent
>>>>> * function
>>>>> - * FP - retaddr_off [ RA of traced func ] return address of traced
>>>>> + * FP - 16 [ RA of traced func ] return address of
>>>>> traced
>>>>
>>>> BPF code uses frame pointers. Shouldn't the trampoline frame look like a
>>>> regular frame [1], i.e. start with return address followed by previous
>>>> frame pointer?
>>>>
>>>
>>> oops, will fix it. Also we need to consider two types of trampoline
>>> stack layout, that is:
>>>
>>> * 1. trampoline called from function entry
>>> * --------------------------------------
>>> * FP + 8 [ RA of parent func ] return address of parent
>>> * function
>>> * FP + 0 [ FP ]
>>> *
>>> * FP - 8 [ RA of traced func ] return address of traced
>>> * function
>>> * FP - 16 [ FP ]
>>> * --------------------------------------
>>> *
>>> * 2. trampoline called directly
>>> * --------------------------------------
>>> * FP - 8 [ RA of caller func ] return address of caller
>>> * function
>>> * FP - 16 [ FP ]
>>> * --------------------------------------
>>
>> Hmm, could you expand a bit on this? The stack frame top 16B (8+8)
>> should follow what the psabi suggests, regardless of the call site?
>>
>
> Maybe I've missed something important! Or maybe I'm misunderstanding
> what you mean. But anyway there is something to show. In my perspective,
> we should construct a complete stack frame, otherwise one layer of stack
> will be lost in calltrace when enable CONFIG_FRAME_POINTER.
>
> We can verify it by `echo 1 >
> /sys/kernel/debug/tracing/options/stacktrace`, and the results as show
> below:
>
> 1. complete stack frame
> * --------------------------------------
> * FP + 8 [ RA of parent func ] return address of parent
> * function
> * FP + 0 [ FP ]
> *
> * FP - 8 [ RA of traced func ] return address of traced
> * function
> * FP - 16 [ FP ]
> * --------------------------------------
> the stacktrace is:
>
> => trace_event_raw_event_bpf_trace_printk
> => bpf_trace_printk
> => bpf_prog_ad7f62a5e7675635_bpf_prog
> => bpf_trampoline_6442536643
> => do_empty
> => meminfo_proc_show
> => seq_read_iter
> => proc_reg_read_iter
> => copy_splice_read
> => vfs_splice_read
> => splice_direct_to_actor
> => do_splice_direct
> => do_sendfile
> => sys_sendfile64
> => do_trap_ecall_u
> => ret_from_exception
>
> 2. omit one FP
> * --------------------------------------
> * FP + 0 [ RA of parent func ] return address of parent
> * function
> * FP - 8 [ RA of traced func ] return address of traced
> * function
> * FP - 16 [ FP ]
> * --------------------------------------
> the stacktrace is:
>
> => trace_event_raw_event_bpf_trace_printk
> => bpf_trace_printk
> => bpf_prog_ad7f62a5e7675635_bpf_prog
> => bpf_trampoline_6442491529
> => do_empty
> => seq_read_iter
> => proc_reg_read_iter
> => copy_splice_read
> => vfs_splice_read
> => splice_direct_to_actor
> => do_splice_direct
> => do_sendfile
> => sys_sendfile64
> => do_trap_ecall_u
> => ret_from_exception
>
> it lost the layer of 'meminfo_proc_show'.
(Lehui was friendly enough to explain the details for me offlist.)
Aha, now I get what you mean! When we're getting into the trampoline
from the fentry-side, an additional stack frame needs to be
created. Otherwise, the unwinding will be incorrect.
So (for the rest of the readers ;-)), the BPF trampoline can be called
from:
A. A tracing point of view; Here, we're calling into the trampoline via
the fentry/patchable entry. In this scenario, an additional stack
frame needs to be constructed for proper unwinding.
B. For kfuncs. Here, the call into the trampoline is just a "regular
call", and no additional stack frame is needed.
@Guo @Song Is the RISC-V ftrace code creating an additional stack frame,
or is the stack unwinding incorrect when the fentry is patched?
Thanks for clearing it up for me, Lehui!
Björn
Powered by blists - more mailing lists