linux-kernel - Re: [PATCH -next V7 0/7] riscv: Optimize function trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJF2gTQ6U1vH79Mu53eQ-GVaFx36C-hEt9Qf6=_vAkHfmgFh1Q@mail.gmail.com>
Date:   Tue, 7 Feb 2023 11:57:06 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Evgenii Shatokhin <e.shatokhin@...ro.com>, suagrfillet@...il.com,
        andy.chiu@...ive.com, linux-riscv@...ts.infradead.org,
        linux-kernel@...r.kernel.org, Guo Ren <guoren@...ux.alibaba.com>,
        anup@...infault.org, paul.walmsley@...ive.com, palmer@...belt.com,
        conor.dooley@...rochip.com, heiko@...ech.de, rostedt@...dmis.org,
        mhiramat@...nel.org, jolsa@...hat.com, bp@...e.de,
        jpoimboe@...nel.org, linux@...ro.com
Subject: Re: [PATCH -next V7 0/7] riscv: Optimize function trace

On Mon, Feb 6, 2023 at 5:56 PM Mark Rutland <mark.rutland@....com> wrote:
>
> On Sat, Feb 04, 2023 at 02:40:52PM +0800, Guo Ren wrote:
> > On Mon, Jan 16, 2023 at 11:02 PM Evgenii Shatokhin
> > <e.shatokhin@...ro.com> wrote:
> > >
> > > Hi,
> > >
> > > On 12.01.2023 12:05, guoren@...nel.org wrote:
> > > > From: Guo Ren <guoren@...ux.alibaba.com>
> > > >
> > > > The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
> > > > PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.
> > > >
> > > >   - The most horrible bug is preemption panic which found by Andy [1].
> > > >     Let's disable preemption for ftrace first, and Andy could continue
> > > >     the ftrace preemption work.
> > >
> > > It seems, the patches #2-#7 of this series do not require "riscv:
> > > ftrace: Fixup panic by disabling preemption" and can be used without it.
> > >
> > > How about moving that patch out of the series and processing it separately?
> > Okay.
> >
> > >
> > > As it was pointed out in the discussion of that patch, some other
> > > solution to non-atomic changes of the prologue might be needed anyway.
> > I think you mean Mark Rutland's DYNAMIC_FTRACE_WITH_CALL_OPS. But that
> > still needs to be ready. Let's disable PREEMPT for ftrace first.
>
> FWIW, taking the patch to disable FTRACE with PREEMPT for now makes sense to
> me, too.
Thx, you agree with that.

>
> The DYNAMIC_FTRACE_WITH_CALL_OPS patches should be in v6.3. They're currently
> queued in the arm64 tree in the for-next/ftrace branch:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/ftrace
>   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/
>
> ... and those *should* be in v6.3.
Glade to hear that. Great!

>
> Patches to imeplement DIRECT_CALLS atop that are in review at the moment:
>
>   https://lore.kernel.org/linux-arm-kernel/20230201163420.1579014-1-revest@chromium.org/
Good reference. Thx for sharing.

>
> ... and if riscv uses the CALL_OPS approach, I believe it can do much the same
> there.
>
> If riscv wants to do a single atomic patch to each patch-site (to avoid
> stop_machine()), then direct calls would always needs to bounce through the
> ftrace_caller trampoline (and acquire the direct call from the ftrace_ops), but
> that might not be as bad as it sounds -- from benchmarking on arm64, the bulk
> of the overhead seen with direct calls is when using the list_ops or having to
> do a hash lookup, and both of those are avoided with the CALL_OPS approach.
> Calling directly from the patch-site is a minor optimization relative to
> skipping that work.
Yes, CALL_OPS could solve the PREEMPTION & stop_machine problems. I
would follow up.

The difference from arm64 is that RISC-V is 16bit/32bit mixed
instruction ISA, so we must keep ftrace_caller & ftrace_regs_caller in
2048 aligned. Then:
FTRACE_UPDATE_MAKE_CALL:
  * addr+00:          NOP // Literal (first 32 bits)
  * addr+04:          NOP // Literal (last 32 bits)
  * addr+08: func: auipc t0, ? // All trampolines are in the 2048
aligned place, so this point won't be changed.
  * addr+12:          jalr ?(t0) // For different trampolines:
ftrace_regs_caller, ftrace_caller

FTRACE_UPDATE_MAKE_NOP:
  * addr+00:          NOP // Literal (first 32 bits)
  * addr+04:          NOP // Literal (last 32 bits)
  * addr+08: func: c.j     // jump to addr + 16 and skip broken insn & jalr
  * addr+10:          xxx   // last half & broken insn of auipc t0, ?
  * addr+12:          jalr ?(t0) // To be patched to jalr ?<t0> ()
  * addr+16: func body

Right? (The call site would be increased from 64bit to 128bit ahead of func.)

>
> Thanks,
> Mark.




--
Best Regards
 Guo Ren