lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFTtA3Nb66VZGoMVfrVs6cYLorjbSX8GmThyKB1FFvfg2HknpA@mail.gmail.com>
Date: Mon, 30 Dec 2024 03:08:00 +0800
From: Andy Chiu <andybnac@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Björn Töpel <bjorn@...nel.org>, 
	Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>, 
	Albert Ou <aou@...s.berkeley.edu>, Nathan Chancellor <nathan@...nel.org>, 
	Nick Desaulniers <ndesaulniers@...gle.com>, Bill Wendling <morbo@...gle.com>, 
	Justin Stitt <justinstitt@...gle.com>, linux-kernel@...r.kernel.org, 
	linux-riscv@...ts.infradead.org, llvm@...ts.linux.dev, bjorn@...osinc.com, 
	puranjay12@...il.com, alexghiti@...osinc.com, yongxuan.wang@...ive.com, 
	greentime.hu@...ive.com, nick.hu@...ive.com, nylon.chen@...ive.com, 
	tommy.wu@...ive.com, eric.lin@...ive.com, viccent.chen@...ive.com, 
	zong.li@...ive.com, samuel.holland@...ive.com
Subject: Re: [PATCH v3 0/7] riscv: ftrace: atmoic patching and preempt improvements

Steven Rostedt <rostedt@...dmis.org> 於 2024年12月24日 週二 上午11:15寫道:
>
> On Wed, 27 Nov 2024 22:25:57 +0100
> Björn Töpel <bjorn@...nel.org> wrote:
>
> > Adding Steven.
>
> And this has been in my draft folder for almost a month :-p
>
> I kept coming to this email, but then got distracted by something else.
>
> >
> > Andy Chiu <andybnac@...il.com> writes:
> >
> > > This series makes atmoic code patching possible in riscv ftrace. A
> > > direct benefit of this is that we can get rid of stop_machine() when
> > > patching function entries. This also makes it possible to run ftrace
> > > with full kernel preemption. Before this series, the kernel initializes
> > > patchable function entries to NOP4 + NOP4. To start tracing, it updates
> > > entries to AUIPC + JALR while holding other cores in stop_machine.
> > > stop_machine() is required because it is impossible to update 2
> > > instructions, and be seen atomically. And preemption must have to be
> > > prevented, as kernel preemption allows process to be scheduled out while
> > > executing on one of these instruction pairs.
> > >
> > > This series addresses the problem by initializing the first NOP4 to
> > > AUIPC. So, atmoic patching is possible because the kernel only has to
> > > update one instruction. As long as the instruction is naturally aligned,
> > > then it is expected to be updated atomically.
> > >
> > > However, the address range of the ftrace trampoline is limited to +-2K
> > > from ftrace_caller after appplying this series. This issue is expected
> > > to be solved by Puranjay's CALL_OPS, where it adds 8B naturally align
> > > data in front of pacthable functions and can  use it to direct execution
> > > out to any custom trampolines.
> > >
> > > The series is composed by three parts. The first part cleans up the
> > > existing issues when the kernel is compiled with clang.The second part
> > > modifies the ftrace code patching mechanism (2-4) as mentioned above.
> > > Then prepare ftrace to be able to run with kernel preemption (5,6)
> > >
> > > An ongoing fix:
> > >
> > > Since there is no room for marking *kernel_text_address as notrace[1] at
> > > source code level, there is a significant performance regression when
> > > using function_graph with TRACE_IRQFLAGS enabled. There can be as much as
> > > 8 graph handler being called in each function-entry. The current
> > > workaround requires us echo "*kernel_text_address" into
> > > set_ftrace_notrace before starting the trace. However, we observed that
> > > the kernel still enables the patch site in some cases even with
> > > *kernel_text_address properly added in the file While the root cause is
> > > still under investagtion, we consider that it should not be the reason
> > > for holding back the code patching, in order to unblock the call_ops
> > > part.
> >
> > Maybe Steven knows this from the top of his head!
> >
> > As Andy points out, "*kernel_text_address" is used in the stack
> > unwinding on RISC-V. So, if you do a tracing without filtering *and*
> > TRACE_IRQFLAGS, one will drown in traces.
>
> I tested set_ftrace_notrace on x86 and the function graph tracer does honor
> it. I wonder if there's a kernel

After checking the log buffer, I can confirm that riscv does also
honor set_ftrace_notrace. It does not print out "*kernel_text_address"
in the trace buffer. Sorry for not making this clear.

However, the problem is that the patch sites "*kernel_text_address"
were enabled, allowing the code flow into ftrace and parts of fgraph.
In particular, with TRACE_IRQFLAGS enabled, the functions are called
extensively, causing the significant slow down.

IIUC, it is reasonable to enable a patch site if there is at least one
tracing instance that request the function. However, in another
experiment, the patch sites of "*kernel_text_address"  are enabled
even when all instances have them disabled. Here is a way to reproduce
it:

cd /sys/kernel/
mount -t tracefs t tracing/
cd tracing/
mkdir instances/1
mkdir instances/2
cd instances/1/
echo *kernel_text_address > set_ftrace_notrace
echo *sched* > set_ftrace_filter # this is just to make the kernel
more responsive
cat set_ftrace_notrace
echo function_graph > current_tracer
cd ../2/
echo *kernel_text_address > set_ftrace_notrace
cat set_ftrace_notrace
echo function_graph > current_tracer

To figure out this problem, it is probably worth trying to understand
why the patch site is enabled after the above sequence:

It seems like the ftrace_ops being passed into
__ftrace_hash_rec_update() at the start of the second tracer has an
empty notrace hash. This leads to the eventual activation of the patch
sites. However, graph_ops does not have the empty notrace hash when
the user starts the first graph tracer. I am trying to understand this
part of the code, and the relationship between graph_ops and ops in
the fgraph_array.

>
> >
> > E.g. the ftrace selftest:
> >   | $ ./ftracetest -vvv test.d/ftrace/fgraph-multi.tc
> >
> > will generate a lot of traces.
> >
> > Now, if we add:
> > --8<--
> > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > index ff88f97e41fb..4f30a4d81d99 100644
> > --- a/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/fgraph-multi.tc
> > @@ -84,6 +84,7 @@ cd $INSTANCE2
> >  do_test '*rcu*' 'rcu'
> >  cd $WD
> >  cd $INSTANCE3
> > +echo '*kernel_text_address' > set_ftrace_notrace
> >  echo function_graph > current_tracer
> >
> >  sleep 1
> > -->8--
> >
> > The graph tracer will not honor the "set_ftrace_notrace" in $INSTANCE3,
> > but still enable the *kernel_text_address traces. (Note that there are
> > no filters in the test, so *all* ftrace recs will be enabled.)
> >
> > Are we holding the graph tracer wrong?
>
> What do you get when you do:
>
>   # grep kernel_text_address available_filter_functions
>
> ?

We get:
kernel_text_address
__kernel_text_address

>
> >
> >
> > Happy thanksgiving!
> > Björn
>
> And Merry Christmas!
>
> -- Steve
>

And Happy New Year!


Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ