lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Feb 2023 09:31:25 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Evgenii Shatokhin <e.shatokhin@...ro.com>, suagrfillet@...il.com,
        andy.chiu@...ive.com, linux-riscv@...ts.infradead.org,
        linux-kernel@...r.kernel.org, Guo Ren <guoren@...ux.alibaba.com>,
        anup@...infault.org, paul.walmsley@...ive.com, palmer@...belt.com,
        conor.dooley@...rochip.com, heiko@...ech.de, rostedt@...dmis.org,
        mhiramat@...nel.org, jolsa@...hat.com, bp@...e.de,
        jpoimboe@...nel.org, linux@...ro.com
Subject: Re: [PATCH -next V7 0/7] riscv: Optimize function trace

On Wed, Feb 8, 2023 at 10:46 PM Mark Rutland <mark.rutland@....com> wrote:
>
> On Wed, Feb 08, 2023 at 10:30:56AM +0800, Guo Ren wrote:
> > Hi Mark,
> >
> > Thx for the thoughtful reply.
> >
> > On Tue, Feb 7, 2023 at 5:17 PM Mark Rutland <mark.rutland@....com> wrote:
> > > Note that I'm assuming you will *always* go through a common ftrace_caller
> > > trampoline (even for direct calls), with the trampoline responsible for
> > > recovering the direct trampoline (or ops->func) from the ops pointer.
> > >
> > > That would only require 64-bit alignment on 64-bit (or 32-bit alignment on
> > > 32-bit) to keep the literal naturally-aligned; the rest of the instructions
> > > wouldn't require additional alignment.
> > >
> > > For example, I would expect that (for 64-bit) you'd use:
> > >
> > >   # place 2 NOPs *immediately before* the function, and 3 NOPs at the start
> > >   -fpatchable-function-entry=5,2
> > >
> > >   # Align the function to 8-bytes
> > >   -falign=functions=8
> > >
> > > ... and your trampoline in each function could be initialized to:
> > >
> > >   # Note: aligned to 8 bytes
> > >   addr-08               // Literal (first 32-bits)      // set to ftrace_nop_ops
> > >   addr-04               // Literal (last 32-bits)       // set to ftrace_nop_ops
> > >   addr+00       func:   mv      t0, ra
> > >   addr+04               auipc   t1, ftrace_caller
> > >   addr+08               nop
> > >
> > > ... and when enabled can be set to:
> > >
> > >   # Note: aligned to 8 bytes
> > >   addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > >   addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > >   addr+00       func:   mv      t0, ra
> > We needn't "mv t0, ra" here because our "jalr" could work with t0 and
> > won't affect ra. Let's do it in the trampoline code, and then we can
> > save another word here.
>
> Ah; I thought JALR always clobbered ra? Or can that specify the register to
> save the link address to?
Yes, that's the feature of riscv :) We could use any register to save
the link address.

>
> I'm not that familiar with riscv asm, so I've probably just got that wrong.
>
> > >   addr+04               auipc   t1, ftrace_caller
> > >   addr+08               jalr    ftrace_caller(t1)
> >
> > Here is the call-site:
> >    # Note: aligned to 8 bytes
> >    addr-08               // Literal (first 32-bits)      // patched to ops ptr
> >    addr-04               // Literal (last 32-bits)       // patched to ops ptr
> >    addr+00               auipc   t0, ftrace_caller
> >    addr+04               jalr    ftrace_caller(t0)
Sorry, it should be:
         addr+04               jalr    t0, ftrace_caller(t0)

>
> I'm a bit confused there; I thought that the `symbol(reg)` addressing mode was
> generating additional bits that the AUPIC didn't -- have I got that wrong?
>
> What specifies which register the JALR will write the link address to?
According to the spec, auipc t1,0x0 should write PC + 0x0<<12 (which
is equal to PC) to t1 and then jalr t0, (t0)0 jumps to the address
stored in t0 + 0x0 and stores the return address to t0.

That means auipc defines xxx << 12 bits, jalr defines lowest 12 bits.

>
> > > Note: this *only* requires patching the literal and NOP<->JALR; the MV and
> > > AUIPC aren't harmful and can always be there. This way, you won't need to use
> > > stop_machine().
> > Yes, simplest nop is better than c.j. I confused.
> >
> > >
> > > With that, the ftrace_caller trampoline can recover the `ops` pointer at a
> > > negative offset from `ra`, and can recover the instrumented function's return
> > > address in `t0`. Using the `ops` pointer, it can figure out whether to branch
> > > to a direct trampoline or whether to save/restore the regs around invoking
> > > ops->func.
> > >
> > > For 32-bit it would be exactly the same, except you'd only need a single nop
> > > before the function, and the offset would be -0x10.
> > Yes, we reduced another 4 bytes & a smaller alignment for better code
> > size when 32-bit.
> >    # Note: aligned to 4 bytes
> >    addr-04               // Literal (last 32-bits)       // patched to ops ptr
> >    addr+00               auipc   t0, ftrace_caller
> >    addr+04               jalr    ftrace_caller(t0)
addr+04               jalr    t0, ftrace_caller(t0)

> > >
>
> > > That's what arm64 does; the only difference is that riscv would *always* need
> > > to go via the trampoline in order to make direct calls.
> > We need one more trampoline here beside ftrace_caller &
> > ftrace_regs_caller: It's "direct_caller".
> >
> > addr+04         nop -> direct_caller/ftrace_caller/ftrace_regs_caller
>
> I'd strongly recommend that you instead implement FTRACE_WITH_ARGS and
> deprecate FTRACE_WITH_REGS, like arm64 has done, then you only need a single
> ftrace_caller, as I mentioned above. That way there's no risk that you need to
> patch the AUIPC after initialization.
>
> The arm64 FTRACE_WITH_ARGS conversion is in mainline, and arm64's
> FTRACE_WITH_CALL_OPS is based upon that. Florent's DIRECT_CALLS patches add the
> direct call logic to the same ftrace_caller trampoline.
Thx for the suggestion of only keeping the ftrace_caller idea, but
it's another topic.

What I want to point out:
If we keep "auipc (addr+00)" fixed, we could use the different
trampolines at "jalr (addr+0x4)" (All of them must be in one 2k
aligned area).

>
> Thanks,
> Mark.



-- 
Best Regards
 Guo Ren

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ