[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wn861z7n.fsf@all.your.base.are.belong.to.us>
Date: Mon, 07 Nov 2022 17:55:24 +0100
From: Björn Töpel <bjorn@...nel.org>
To: Chen Guokai <chenguokai17@...ls.ucas.ac.cn>,
paul.walmsley@...ive.com, palmer@...belt.com,
aou@...s.berkeley.edu, rostedt@...dmis.org, mingo@...hat.com,
sfr@...b.auug.org.au
Cc: linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
liaochang1@...wei.com, Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
Subject: Re: [PATCH v4 5/8] riscv/kprobe: Search free register(s) to clobber
for 'AUIPC/JALR'
Chen Guokai <chenguokai17@...ls.ucas.ac.cn> writes:
> From: Liao Chang <liaochang1@...wei.com>
>
> From: Liao Chang <liaochang1@...wei.com>
>
> This patch implement the algorithm of searching free register(s) to
> form a long-jump instruction pair.
>
> AUIPC/JALR instruction pair is introduced with a much wider jump range
> (4GB), where auipc loads the upper 20 bits to a free register and jalr
> appends the lower 12 bits to form a 32 bit immediate. Since kprobes can
> be instrumented at anywhere in kernel space, hence the free register
> should be found in a generic way, not depending on the calling convention
> or any other regulations.
>
> The algorithm for finding the free register is inspired by the register
> renaming in modern processors. From the perspective of register renaming,
> a register could be represented as two different registers if two neighbour
> instructions both write to it but no one ever reads. Extending this fact,
> a register is considered to be free if there is no read before its next
> write in the execution flow. We are free to change its value without
> interfering normal execution.
>
> In order to do jump optimization, it needs to search two free registers,
> the first one is used to form AUIPC/JALR jumping to detour buffer, the
> second one is used to form JR jumping back from detour buffer. If first
> one never been updated by any instructions replaced by 'AUIPC/JALR',
> both register supposes to the same one.
>
> Let's use the example below to explain how the algorithm work. Given
> kernel is RVI and RCV hybrid binary, and one kprobe is instrumented at
> the entry of function idle_dummy.
>
> Before Optimized Detour buffer
> <idle_dummy>: ...
> #1 add sp,sp,-16 auipc a0, #? add sp,sp,-16
> #2 sd s0,8(sp) sd s0,8(sp)
> #3 addi s0,sp,16 jalr a0, #?(a0) addi s0,sp,16
> #4 ld s0,8(sp) ld s0,8(sp)
> #5 li a0,0 li a0,0 auipc a0, #?
> #6 addi sp,sp,16 addi sp,sp,16 jr x0, #?(a0)
> #7 ret ret
>
> For regular kprobe, it is trival to replace the first instruction with
> C.EREABK, no more instruction and register will be clobber, in order to
"C.EBREAK"
> optimize kprobe with long-jump, it used to patch the first 8 bytes with
> AUIPC/JALR, and a0 will be chosen to save the address jumping to,
> because from #1 to #7, a0 is the only one register that satifies two
> conditions: (1) No read before write (2) Never been updated in detour
> buffer. While s0 has been used as the source register at #2, so it is
> not free to clobber.
>
> The searching starts from the kprobe and stop at the last instruction of
> function or the first branch/jump instruction, it decodes out the 'rs'
> and 'rd' part of each visited instruction. If the 'rd' never been read
> before, then record it to bitmask 'write'; if the 'rs' never been
> written before, then record it to another bitmask 'read'. When searching
> stops, the remaining bits of 'write' are the free registers to form
> AUIPC/JALR or JR.
>
AFAIU, the algorithm only tracks registers that are *in use*. You are
already scanning the whole function (next patch). What about the caller
saved registers that are *not* used by the function in the probe range?
Can those, potentially unused, regs be used?
> Signed-off-by: Liao Chang <liaochang1@...wei.com>
> Co-developed-by: Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
> Signed-off-by: Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
> ---
> arch/riscv/kernel/probes/opt.c | 225 ++++++++++++++++++++++++++++++++-
> 1 file changed, 224 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index e4a619c2077e..6d23c843832e 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c
[...]
> +static void arch_find_register(unsigned long start, unsigned long end,
Nit; When I see "arch_" I think it's functionality that can be
overridden per-arch. This is not the case, but just a helper for RV.
[...]
> static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
> - int *rd1, int *rd2)
> + int *rd, int *ra)
Nit; Please get rid of this code churn, just name the parameters
correctly on introduction in the previous patch.
[...]
> + *rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL);
> + *ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);
Hmm, __builtin_ctzl is undefined for 0, right? Can that be triggered
here?
Björn
Powered by blists - more mailing lists