lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEEvPLBi24RQ5gG3NCR4B2f5wEpnoYu_YSki6yo5=PZgoKcuzg@mail.gmail.com>
Date: Mon, 9 Dec 2024 15:57:13 +0100
From: Robbin Ehn <rehn@...osinc.com>
To: Björn Töpel <bjorn@...nel.org>
Cc: Andy Chiu <andybnac@...il.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>, 
	Albert Ou <aou@...s.berkeley.edu>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org, 
	bjorn@...osinc.com, puranjay12@...il.com, alexghiti@...osinc.com, 
	yongxuan.wang@...ive.com, greentime.hu@...ive.com, nick.hu@...ive.com, 
	nylon.chen@...ive.com, tommy.wu@...ive.com, eric.lin@...ive.com, 
	viccent.chen@...ive.com, zong.li@...ive.com, samuel.holland@...ive.com
Subject: Re: [PATCH v3 3/7] riscv: ftrace: prepare ftrace for atomic code patching

On Fri, Dec 6, 2024 at 11:02 AM Björn Töpel <bjorn@...nel.org> wrote:
>
> Adding Robbin for input, who's doing much more crazy text patching in
> JVM, than what we do in the kernel. ;-)
>
> Andy Chiu <andybnac@...il.com> writes:
>
> > From: Andy Chiu <andy.chiu@...ive.com>
> >
> > We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> > instruction fetch can break down to 4 byte at a time, it is impossible
> > to update two instructions without a race. In order to mitigate it, we
> > initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> > patching can change NOP4 to JALR to eable/disable ftrcae from a
> > function. This limits the reach of each ftrace entry to +-2KB displacing
> > from ftrace_caller.
> >
> > Starting from the trampoline, we add a level of indirection for it to
> > reach ftrace caller target. Now, it loads the target address from a
> > memory location, then perform the jump. This enable the kernel to update
> > the target atomically.
> >
> > The ordering of reading/updating the targert address should be guarded
> > by generic ftrace code, where it sends smp_rmb ipi.
>
> Let's say we're tracing "f". Previously w/ stop_machine() it was
> something like:
>
> f:
> 1: nop
>    nop
>    ...
>    ...
>
> ftrace_caller:
>    ...
>    auipc        a2, function_trace_op
>    ld           a2, function_trace_op(a2)
>    ...
> 2: auipc    ra, ftrace_stub
>    jalr     ftrace_stub(ra)
>
> The text was patched by ftrace in 1 and 2.
>
> ...and now:
> f:
>    auipc    t0, ftrace_caller
> A: nop
>    ...
>    ...
>
> ftrace_caller:
>    ...
>    auipc    a2, function_trace_op
>    ld       a2, function_trace_op(a2)
>    ...
>    auipc    ra, ftrace_call_dest
>    ld       ra, ftrace_call_dest(ra)
>    jalr     ra
>
> The text is only patched in A, and the tracer func is loaded via
> ftrace_call_dest.
>
> Today, when we enable trace "f" the following is done by ftrace:
>   Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func
>   Text patch 1: nop,nop -> call ftrace_caller
>   store function_trace_op
>   smp_wmb()
>   IPI: smp_rmb()
>   Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call
>
> Disable, would be something like:
>   Text patch 2: call function_trace_call -> call arch_ftrace_ops_list_func
>   Text patch 1: call ftrace_caller -> nop,nop
>   store function_trace_op
>   smp_wmb()
>   IPI: smp_rmb()
>   Text patch 2: call arch_ftrace_ops_list_func -> call ftrace_stub
>
> Now with this change, enable would be:
>   store ftrace_call_dest (was: Text patch 2: call ftrace_stub -> call arch_ftrace_ops_list_func)
>   <<ORDERING>>
>   Text patch A: nop -> jalr ftrace_caller(t0)
>   store function_trace_op
>   smp_wmb()
>   IPI: smp_rmb()
>   store ftrace_call_dest (was: Text patch 2: call arch_ftrace_ops_list_func -> call function_trace_call)
>
> Seems like we're missing some data ordering in "<<ORDERING>>", wrt to
> the text patching A (The arch specific ftrace_update_ftrace_func())? Or
> are we OK with reordering there? Maybe add what's done for
> function_trace_op?
>
> [...]
>

Hi, so we allow reordering of the following 3 stores (set via
ftrace_modify_all_code()):

ftrace_call_dest = ftrace_ops_list_func
Instruction patch NOP -> JALR
function_trace_op = set_function_trace_op
<data-ordering>
ftrace_call_dest = ftrace_trace_function
<ins-ordering>

>From what I can tell all combinations will be fine as trace OP is not
read and ftrace_call_dest should be ftrace_stub in such reordering
case.
It looks like, as we do this under lock (should be an lockdep assert
in ftrace_modify_all_code for ftrace_lock), we only go from:
n tracers => 0 tracers
0 tracers => n tracers

Meaning we never add and remove tracers in the same update, so this
reordering seems fine.
Otherwise we could pass an OP for an old tracer into a new tracer.
(function_trace_op happens before ftrace_call_dest store)

As the function_trace_op can be concurrently loaded via ftrace_caller
it should thus be stored with WRITE_ONCE for good measure.

> > Signed-off-by: Andy Chiu <andy.chiu@...ive.com>
> > ---
> >  arch/riscv/include/asm/ftrace.h |  4 ++
> >  arch/riscv/kernel/ftrace.c      | 80 +++++++++++++++++++++------------
> >  arch/riscv/kernel/mcount-dyn.S  |  9 ++--
> >  3 files changed, 62 insertions(+), 31 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
> > index 4ca7ce7f34d7..36734d285aad 100644
> > --- a/arch/riscv/include/asm/ftrace.h
> > +++ b/arch/riscv/include/asm/ftrace.h
> > @@ -80,6 +80,7 @@ struct dyn_arch_ftrace {
> >  #define JALR_T0                      (0x000282e7)
> >  #define AUIPC_T0             (0x00000297)
> >  #define NOP4                 (0x00000013)
> > +#define JALR_RANGE           (JALR_SIGN_MASK - 1)
> >
> >  #define to_jalr_t0(offset)                                           \
> >       (((offset & JALR_OFFSET_MASK) << JALR_SHIFT) | JALR_T0)
> > @@ -117,6 +118,9 @@ do {                                                                      \
> >   * Let auipc+jalr be the basic *mcount unit*, so we make it 8 bytes here.
> >   */
> >  #define MCOUNT_INSN_SIZE 8
> > +#define MCOUNT_AUIPC_SIZE    4
> > +#define MCOUNT_JALR_SIZE     4
> > +#define MCOUNT_NOP4_SIZE     4
>
> Align please.
>
> >
> >  #ifndef __ASSEMBLY__
> >  struct dyn_ftrace;
> > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > index 4b95c574fd04..5ebe412280ef 100644
> > --- a/arch/riscv/kernel/ftrace.c
> > +++ b/arch/riscv/kernel/ftrace.c
> > @@ -64,42 +64,64 @@ static int ftrace_check_current_call(unsigned long hook_pos,
> >       return 0;
> >  }
> >
> > -static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
> > -                             bool enable, bool ra)
> > +static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target, bool validate)
>
> While we're updating this function; Can we rename hook_pos to something
> that makes sense from an ftrace perspective?
>
> >  {
> >       unsigned int call[2];
> > -     unsigned int nops[2] = {NOP4, NOP4};
> > +     unsigned int replaced[2];
> > +
> > +     make_call_t0(hook_pos, target, call);

If you use to_jalr_t0 it's easier to read. (maybe remove make_call_t0).

> >
> > -     if (ra)
> > -             make_call_ra(hook_pos, target, call);
> > -     else
> > -             make_call_t0(hook_pos, target, call);
> > +     if (validate) {
> > +             /*
> > +              * Read the text we want to modify;
> > +              * return must be -EFAULT on read error
> > +              */

Use to_auipc_t0 for validation.

> > +             if (copy_from_kernel_nofault(replaced, (void *)hook_pos,
> > +                                          MCOUNT_INSN_SIZE))
>
> Don't wrap this line.
>
> > +                     return -EFAULT;
> > +
> > +             if (replaced[0] != call[0]) {
> > +                     pr_err("%p: expected (%08x) but got (%08x)\n",
> > +                            (void *)hook_pos, call[0], replaced[0]);
> > +                     return -EINVAL;
> > +             }
> > +     }
> >
> > -     /* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
> > -     if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
> > +     /* Replace the jalr at once. Return -EPERM on write error. */
> > +     if (patch_insn_write((void *)(hook_pos + MCOUNT_AUIPC_SIZE), call + 1, MCOUNT_JALR_SIZE))
> >               return -EPERM;
> >
> >       return 0;
> >  }
> >
> > -int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> > +static int __ftrace_modify_call_site(ftrace_func_t *hook_pos, ftrace_func_t target, bool enable)

As the bool value enable is hardcoded to true/false I would just have
two functions.
IMHO the name ftrace_modify_call_site() makes little sense, especially
since there is a ftrace_modify_call().

> >  {
> > -     unsigned int call[2];
> > +     ftrace_func_t call = target;
> > +     ftrace_func_t nops = &ftrace_stub;
>
> Confusing to call nops. This is not nops. This is the ftrace_stub. Also
> the __ftrace_modify_call_site is not super clear to me. Maybe just ditch
> the enable flag, and have two functions? Or just or inline it?
>
> >
> > -     make_call_t0(rec->ip, addr, call);
> > -
> > -     if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
> > -             return -EPERM;
> > +     WRITE_ONCE(*hook_pos, enable ? call : nops);
> >
> >       return 0;
> >  }
> >
> > +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> > +{
> > +     unsigned long distance, orig_addr;
> > +
> > +     orig_addr = (unsigned long)&ftrace_caller;
> > +     distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> > +     if (distance > JALR_RANGE)
> > +             return -EINVAL;
> > +
> > +     return __ftrace_modify_call(rec->ip, addr, false);
> > +}
> > +
> >  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >                   unsigned long addr)
> >  {
> > -     unsigned int nops[2] = {NOP4, NOP4};
> > +     unsigned int nops[1] = {NOP4};
>
> It's just one nop, not nops. No biggie, but why array?
>
> >
> > -     if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
> > +     if (patch_insn_write((void *)(rec->ip + MCOUNT_AUIPC_SIZE), nops, MCOUNT_NOP4_SIZE))
> >               return -EPERM;
> >
> >       return 0;
> > @@ -114,21 +136,23 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
> >   */
> >  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
> >  {
> > +     unsigned int nops[2];
> >       int out;
> >
> > +     make_call_t0(rec->ip, &ftrace_caller, nops);
> > +     nops[1] = NOP4;

Use to_auipc_t0.

> > +
> >       mutex_lock(&text_mutex);
> > -     out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > +     out = patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE);
> >       mutex_unlock(&text_mutex);
> >
> >       return out;
> >  }
> >
> > +ftrace_func_t ftrace_call_dest = ftrace_stub;
> >  int ftrace_update_ftrace_func(ftrace_func_t func)
> >  {
> > -     int ret = __ftrace_modify_call((unsigned long)&ftrace_call,
> > -                                    (unsigned long)func, true, true);
> > -
> > -     return ret;
> > +     return __ftrace_modify_call_site(&ftrace_call_dest, func, true);
> >  }
> >
> >  struct ftrace_modify_param {
> > @@ -182,7 +206,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
> >       if (ret)
> >               return ret;
> >
> > -     return __ftrace_modify_call(caller, addr, true, false);
> > +     return __ftrace_modify_call(caller, addr, true);
> >  }
> >  #endif
> >
> > @@ -217,17 +241,17 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
> >       prepare_ftrace_return(&fregs->ra, ip, fregs->s0);
> >  }
> >  #else /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> > -extern void ftrace_graph_call(void);
> > +ftrace_func_t ftrace_graph_call_dest = ftrace_stub;
> >  int ftrace_enable_ftrace_graph_caller(void)
> >  {
> > -     return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > -                                 (unsigned long)&prepare_ftrace_return, true, true);
> > +     return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> > +                                      &prepare_ftrace_return, true);
> >  }
> >
> >  int ftrace_disable_ftrace_graph_caller(void)
> >  {
> > -     return __ftrace_modify_call((unsigned long)&ftrace_graph_call,
> > -                                 (unsigned long)&prepare_ftrace_return, false, true);
> > +     return __ftrace_modify_call_site(&ftrace_graph_call_dest,
> > +                                      &prepare_ftrace_return, false);
> >  }
> >  #endif /* CONFIG_DYNAMIC_FTRACE_WITH_ARGS */
> >  #endif /* CONFIG_DYNAMIC_FTRACE */
> > diff --git a/arch/riscv/kernel/mcount-dyn.S b/arch/riscv/kernel/mcount-dyn.S
> > index e988bd26b28b..bc06e8ab81cf 100644
> > --- a/arch/riscv/kernel/mcount-dyn.S
> > +++ b/arch/riscv/kernel/mcount-dyn.S
> > @@ -162,7 +162,8 @@ SYM_FUNC_START(ftrace_caller)
> >       mv      a3, sp
> >
> >  SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> > -     call    ftrace_stub
> > +     REG_L   ra, ftrace_call_dest
> > +     jalr    0(ra)

I would write these as  "jalr    ra,0(ra)", as it may not be obvious.

Nice improvement, thanks!

/Robbin

> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >       addi    a0, sp, ABI_RA
> > @@ -172,7 +173,8 @@ SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
> >       mv      a2, s0
> >  #endif
> >  SYM_INNER_LABEL(ftrace_graph_call, SYM_L_GLOBAL)
> > -     call    ftrace_stub
> > +     REG_L   ra, ftrace_graph_call_dest
> > +     jalr    0(ra)
> >  #endif
> >       RESTORE_ABI
> >       jr      t0
> > @@ -185,7 +187,8 @@ SYM_FUNC_START(ftrace_caller)
> >       PREPARE_ARGS
> >
> >  SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL)
>
> Not used, please remove.
>
> > -     call    ftrace_stub
> > +     REG_L   ra, ftrace_call_dest
> > +     jalr    0(ra)
> >
> >       RESTORE_ABI_REGS
> >       bnez    t1, .Ldirect
> > --
> > 2.39.3 (Apple Git-145)
>
>
>
> Björn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ