lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 May 2023 11:56:23 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Leonardo Bras <leobras@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Leonardo Bras <leobras@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Juergen Gross <jgross@...e.com>,
        Yury Norov <yury.norov@...il.com>,
        Chen Zhongjin <chenzhongjin@...wei.com>,
        Zhen Lei <thunder.leizhen@...wei.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Nadav Amit <namit@...are.com>,
        Daniel Bristot de Oliveira <bristot@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v3 1/1] trace,smp: Add tracepoints around remotelly
 called functions

On 10/05/23 20:01, Leonardo Bras wrote:
> When running RT workloads in isolated CPUs, many cases of deadline misses
> are caused by remote CPU requests such as smp_call_function*().
>
> For those cases, having the names of those functions running around the
> deadline miss moment could help (a lot) finding a target for the next
> improvements.
>
> Add tracepoints for acquiring the function name & csd before entry and
> after returning from the remote-cpu requested function.
>
> Also, add tracepoints on the remote cpus requesting them.

Per the v1 thread, I got some context as to why we want this, but this
changelog is quite sparse on the subject. How about:

"""
The recently added ipi_send_{cpu,cpumask} tracepoints allow finding sources
of IPIs targeting CPUs running latency-sensitive applications.

For NOHZ_FULL CPUs, all IPIs are interference, and those tracepoints are
sufficient to find them and work on getting rid of them. In some setups
however, not *all* IPIs are to be suppressed, but long-running IPI
callbacks can still be problematic.

Add a pair of tracepoints to mark the start and end of processing a CSD IPI
callback, similar to what exists for softirq, workqueue or timer callbacks.
"""

And you can probably split the csd_queue_cpu TP into a separate commit,
with something along the lines of:

"""
Add a tracepoint for when a CSD is queued to a remote CPU's
call_single_queue. This allows finding exactly which CPU queued a given CSD
when looking at a csd_function_{entry,exit} event, and also enables us to
accurately measure IPI delivery time with e.g. a synthetic event:

  $ echo 'hist:keys=cpu,csd.hex:ts=common_timestamp.usecs' >\
      /sys/kernel/tracing/events/smp/csd_queue_cpu/trigger
  $ echo 'csd_latency unsigned int dst_cpu; unsigned long csd; u64 time' >\
      /sys/kernel/tracing/synthetic_events
  $ echo \
  'hist:keys=common_cpu,csd.hex:'\
  'time=common_timestamp.usecs-$ts:'\
  'onmatch(smp.csd_queue_cpu).trace(csd_latency,common_cpu,csd,$time)' >\
      /sys/kernel/tracing/events/smp/csd_function_entry/trigger

  $ trace-cmd record -e 'synthetic:csd_latency' hackbench
  $ trace-cmd report
  <...>-467   [001]    21.824263: csd_queue_cpu:        cpu=0 callsite=try_to_wake_up+0x2ea func=sched_ttwu_pending csd=0xffff8880076148b8
  <...>-467   [001]    21.824280: ipi_send_cpu:         cpu=0 callsite=try_to_wake_up+0x2ea callback=generic_smp_call_function_single_interrupt+0x0
  <...>-489   [000]    21.824299: csd_function_entry:   func=sched_ttwu_pending csd=0xffff8880076148b8
  <...>-489   [000]    21.824320: csd_latency:          dst_cpu=0, csd=18446612682193848504, time=36
"""

>
> Signed-off-by: Leonardo Bras <leobras@...hat.com>

Overall I like the CSD vs IPI split, it's a saner approach than logging it
all as IPIs, even if it does generate more events as we have to emit an
event for every csd queued (i.e. no _cpumask() variant is possible).

Some nitpicks below, and one other thing: are we happy with the smp event
namespace, and with the new TP names? Should the namespace be csd instead,
to match the trace_<namespace>_* nomenclature?

> ---
>
> Changes since RFCv2:
> - Fixed some spacing issues and trace calls
>
> Changes since RFCv1:
> - Implemented trace_csd_queue_cpu() as suggested by Valentin Schneider
> - Using EVENT_CLASS in order to avoid duplication
> - Introduced new helper: csd_do_func()
> - Name change from smp_call_function_* to csd_function_*
> - Rebased on top of torvalds/master
>
>  include/trace/events/smp.h | 72 ++++++++++++++++++++++++++++++++++++++
>  kernel/smp.c               | 41 +++++++++++++---------
>  2 files changed, 96 insertions(+), 17 deletions(-)
>  create mode 100644 include/trace/events/smp.h
>
> diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
> new file mode 100644
> index 000000000000..c304318a0203
> --- /dev/null
> +++ b/include/trace/events/smp.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM smp
> +
> +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_SMP_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(csd_queue_cpu,
> +
> +	TP_PROTO(const unsigned int cpu,
> +		 unsigned long callsite,
> +		 smp_call_func_t func,
> +		 call_single_data_t *csd),
> +
> +	TP_ARGS(cpu, callsite, func, csd),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, cpu)
> +		__field(void *, callsite)
> +		__field(void *, func)
> +		__field(void *, csd)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu = cpu;
> +		__entry->callsite = (void *)callsite;
> +		__entry->func = func;
> +		__entry->csd  = csd;
> +	),
> +
> +	TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
                                       ^^^^^^^^
This can be func=%ps, offsets should always be 0 so not very useful (I know
I put %pS for the IPI TPs, that's a force of habit...)

> +		  __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> +);

> +
> +/*
> + * Tracepoints for a function which is called as an effect of smp_call_function.*
> + */
> +DECLARE_EVENT_CLASS(csd_function,
> +
> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> +
> +	TP_ARGS(func, csd),
> +
> +	TP_STRUCT__entry(
> +		__field(void *,	func)
> +		__field(void *,	csd)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->func	= func;
> +		__entry->csd	= csd;
> +	),
> +
> +	TP_printk("function %ps, csd = %p", __entry->func, __entry->csd)

To match the style of the other TPs, that should be:

        TP_printk("func=%ps csd=%p", __entry->func, __entry->csd)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ