lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 07 Nov 2022 17:55:24 +0100
From:   Björn Töpel <bjorn@...nel.org>
To:     Chen Guokai <chenguokai17@...ls.ucas.ac.cn>,
        paul.walmsley@...ive.com, palmer@...belt.com,
        aou@...s.berkeley.edu, rostedt@...dmis.org, mingo@...hat.com,
        sfr@...b.auug.org.au
Cc:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        liaochang1@...wei.com, Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
Subject: Re: [PATCH v4 5/8] riscv/kprobe: Search free register(s) to clobber
 for 'AUIPC/JALR'

Chen Guokai <chenguokai17@...ls.ucas.ac.cn> writes:

> From: Liao Chang <liaochang1@...wei.com>
>
> From: Liao Chang <liaochang1@...wei.com>
>
> This patch implement the algorithm of searching free register(s) to
> form a long-jump instruction pair.
>
> AUIPC/JALR instruction pair is introduced with a much wider jump range
> (4GB), where auipc loads the upper 20 bits to a free register and jalr
> appends the lower 12 bits to form a 32 bit immediate. Since kprobes can
> be instrumented at anywhere in kernel space, hence the free register
> should be found in a generic way, not depending on the calling convention
> or any other regulations.
>
> The algorithm for finding the free register is inspired by the register
> renaming in modern processors. From the perspective of register renaming,
> a register could be represented as two different registers if two neighbour
> instructions both write to it but no one ever reads. Extending this fact,
> a register is considered to be free if there is no read before its next
> write in the execution flow. We are free to change its value without
> interfering normal execution.
>
> In order to do jump optimization, it needs to search two free registers,
> the first one is used to form AUIPC/JALR jumping to detour buffer, the
> second one is used to form JR jumping back from detour buffer. If first
> one never been updated by any instructions replaced by 'AUIPC/JALR',
> both register supposes to the same one.
>
> Let's use the example below to explain how the algorithm work. Given
> kernel is RVI and RCV hybrid binary, and one kprobe is instrumented at
> the entry of function idle_dummy.
>
> Before			Optimized		Detour buffer
> <idle_dummy>:					...
>  #1 add  sp,sp,-16	auipc a0, #?		add  sp,sp,-16
>  #2 sd   s0,8(sp)				sd   s0,8(sp)
>  #3 addi s0,sp,16	jalr  a0, #?(a0)	addi s0,sp,16
>  #4 ld   s0,8(sp)				ld   s0,8(sp)
>  #5 li   a0,0		li   a0,0		auipc a0, #?
>  #6 addi sp,sp,16	addi sp,sp,16		jr    x0, #?(a0)
>  #7 ret			ret
>
> For regular kprobe, it is trival to replace the first instruction with
> C.EREABK, no more instruction and register will be clobber, in order to

"C.EBREAK"

> optimize kprobe with long-jump, it used to patch the first 8 bytes with
> AUIPC/JALR, and a0 will be chosen to save the address jumping to,
> because from #1 to #7, a0 is the only one register that satifies two
> conditions: (1) No read before write (2) Never been updated in detour
> buffer. While s0 has been used as the source register at #2, so it is
> not free to clobber.
>
> The searching starts from the kprobe and stop at the last instruction of
> function or the first branch/jump instruction, it decodes out the 'rs'
> and 'rd' part of each visited instruction. If the 'rd' never been read
> before, then record it to bitmask 'write'; if the 'rs' never been
> written before, then record it to another bitmask 'read'. When searching
> stops, the remaining bits of 'write' are the free registers to form
> AUIPC/JALR or JR.
>

AFAIU, the algorithm only tracks registers that are *in use*. You are
already scanning the whole function (next patch). What about the caller
saved registers that are *not* used by the function in the probe range?
Can those, potentially unused, regs be used?

> Signed-off-by: Liao Chang <liaochang1@...wei.com>
> Co-developed-by: Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
> Signed-off-by: Chen Guokai <chenguokai17@...ls.ucas.ac.cn>
> ---
>  arch/riscv/kernel/probes/opt.c | 225 ++++++++++++++++++++++++++++++++-
>  1 file changed, 224 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/probes/opt.c b/arch/riscv/kernel/probes/opt.c
> index e4a619c2077e..6d23c843832e 100644
> --- a/arch/riscv/kernel/probes/opt.c
> +++ b/arch/riscv/kernel/probes/opt.c

[...]

> +static void arch_find_register(unsigned long start, unsigned long end,

Nit; When I see "arch_" I think it's functionality that can be
overridden per-arch. This is not the case, but just a helper for RV.

[...]

>  static void find_free_registers(struct kprobe *kp, struct optimized_kprobe *op,
> -				int *rd1, int *rd2)
> +				int *rd, int *ra)

Nit; Please get rid of this code churn, just name the parameters
correctly on introduction in the previous patch.

[...]

> +	*rd = ((kw | ow) == 1UL) ? 0 : __builtin_ctzl((kw | ow) & ~1UL);
> +	*ra = (kw == 1UL) ? 0 : __builtin_ctzl(kw & ~1UL);

Hmm, __builtin_ctzl is undefined for 0, right? Can that be triggered
here?


Björn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ