lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 7 Mar 2024 20:51:44 +0100
From: Puranjay Mohan <puranjay12@...il.com>
To: Björn Töpel <bjorn@...nel.org>
Cc: Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>, 
	Albert Ou <aou@...s.berkeley.edu>, Steven Rostedt <rostedt@...dmis.org>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Sami Tolvanen <samitolvanen@...gle.com>, Guo Ren <guoren@...nel.org>, 
	Ley Foon Tan <leyfoon.tan@...rfivetech.com>, Deepak Gupta <debug@...osinc.com>, 
	Sia Jee Heng <jeeheng.sia@...rfivetech.com>, Björn Töpel <bjorn@...osinc.com>, 
	Song Shuai <suagrfillet@...il.com>, Clément Léger <cleger@...osinc.com>, 
	Al Viro <viro@...iv.linux.org.uk>, Jisheng Zhang <jszhang@...nel.org>, 
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS

Hi Björn,

On Thu, Mar 7, 2024 at 8:27 PM Björn Töpel <bjorn@...nel.org> wrote:
>
> Puranjay!
>
> Puranjay Mohan <puranjay12@...il.com> writes:
>
> > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
> > This allows each ftrace callsite to provide an ftrace_ops to the common
> > ftrace trampoline, allowing each callsite to invoke distinct tracer
> > functions without the need to fall back to list processing or to
> > allocate custom trampolines for each callsite. This significantly speeds
> > up cases where multiple distinct trace functions are used and callsites
> > are mostly traced by a single tracer.
> >
> > The idea and most of the implementation is taken from the ARM64's
> > implementation of the same feature. The idea is to place a pointer to
> > the ftrace_ops as a literal at a fixed offset from the function entry
> > point, which can be recovered by the common ftrace trampoline.
>
> Not really a review, but some more background; Another rationale (on-top
> of the improved per-call performance!) for CALL_OPS was to use it to
> build ftrace direct call support (which BPF uses a lot!). Mark, please
> correct me if I'm lying here!
>
> On Arm64, CALL_OPS makes it possible to implement direct calls, while
> only patching one BL instruction -- nice!
>
> On RISC-V we cannot use use the same ideas as Arm64 straight off,
> because the range of jal (compare to BL) is simply too short (+/-1M).
> So, on RISC-V we need to use a full auipc/jal pair (the text patching
> story is another chapter, but let's leave that aside for now). Since we
> have to patch multiple instructions, the cmodx situation doesn't really
> improve with CALL_OPS.
>
> Let's say that we continue building on your patch and implement direct
> calls on CALL_OPS for RISC-V as well.
>
> From Florent's commit message for direct calls:
>
>   |    There are a few cases to distinguish:
>   |    - If a direct call ops is the only one tracing a function:
>   |      - If the direct called trampoline is within the reach of a BL
>   |        instruction
>   |         -> the ftrace patchsite jumps to the trampoline
>   |      - Else
>   |         -> the ftrace patchsite jumps to the ftrace_caller trampoline which
>   |            reads the ops pointer in the patchsite and jumps to the direct
>   |            call address stored in the ops
>   |    - Else
>   |      -> the ftrace patchsite jumps to the ftrace_caller trampoline and its
>   |         ops literal points to ftrace_list_ops so it iterates over all
>   |         registered ftrace ops, including the direct call ops and calls its
>   |         call_direct_funcs handler which stores the direct called
>   |         trampoline's address in the ftrace_regs and the ftrace_caller
>   |         trampoline will return to that address instead of returning to the
>   |         traced function
>
> On RISC-V, where auipc/jalr is used, the direct called trampoline would
> always be reachable, and then first Else-clause would never be entered.
> This means the the performance for direct calls would be the same as the
> one we have today (i.e. no regression!).
>
> RISC-V does like x86 does (-ish) -- patch multiple instructions, long
> reach.
>
> Arm64 uses CALL_OPS and patch one instruction BL.
>
> Now, with this background in mind, compared to what we have today,
> CALL_OPS would give us (again assuming we're using it for direct calls):
>
> * Better performance for tracer per-call (faster ops lookup) GOOD

^ this was the only motivation for me to implement this patch.

I don't think implementing direct calls over call ops is fruitful for
RISC-V because once
the auipc/jalr can be patched atomically, the direct call trampoline
is always reachable.
Solving the atomic text patching problem would be fun!! I am eager to
see how it will be
solved.

> * Larger text size (function alignment + extra nops) BAD
> * Same direct call performance NEUTRAL
> * Same complicated text patching required NEUTRAL
>
> It would be interesting to see how the per-call performance would
> improve on x86 with CALL_OPS! ;-)

If I remember from Steven's talk, x86 uses dynamically allocated trampolines
for per callsite tracers, would CALL_OPS provide better performance than that?

>
> I'm trying to wrap my head if it makes sense to have it on RISC-V, given
> that we're a bit different from Arm64. Does the scale tip to the GOOD
> side?
>
> Oh, and we really need to see performance numbers on real HW! I have a
> VF2 that I could try this series on.

It would be great if you can do it :D.

Thanks,
Puranjay

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ