[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260205094722.6d21d8b0@gandalf.local.home>
Date: Thu, 5 Feb 2026 09:47:22 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Colin Lord <clord@...olab.com>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Masami
Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>
Subject: Re: [PATCH v2] trace/hwlat: prevent false sharing in get_sample()
On Sun, 1 Feb 2026 18:58:38 -0800
Colin Lord <clord@...olab.com> wrote:
I was about to send this as part of my fixes, but taking a closer look at
it, I have more questions.
> The get_sample() function in the hwlat tracer assumes the caller holds
> hwlat_data.lock, but this is not actually happening. The result is
> unprotected data access to hwlat_data, and in per-cpu mode can result in
> false sharing. The false sharing can cause false positive latency
BTW, what exactly do you mean by "false sharing"?
> events, since the sample_width member is involved and gets read as part
> of the main latency detection loop.
I'm trying to figure out why the change in sample_width would cause any
issue.
>
> Convert hwlat_data.count to atomic64_t so it can be safely accessed
> without locking, and prevent false sharing by pulling sample_width into
> a local variable.
The above still makes sense, but this only effects the seqnum, which may
show either duplicate or skipped numbers. But shouldn't affect any of the
other data.
>
> One system this was tested on was a dual socket server with 32 CPUs on
> each numa node. With settings of 1us threshold, 1000us width, and
> 2000us window, this change reduced the number of latency events from
> 500 per second down to approximately 1 event per minute. Some machines
> tested did not exhibit measurable latency from the false sharing.
Is this because the read of hwlat_data.sample_width is a global variable
and could possibly be causing a cache hit latency that shows up on large
machines?
Is that what you mean by "false sharing"?
Oh, and subjects for the tracing subsystem should start with a capital, and
should be something like:
tracing: Fix get_sample() in hwlat from ...
-- Steve
>
> Signed-off-by: Colin Lord <clord@...olab.com>
> ---
> Changes in v2:
> - convert hwlat_data.count to atomic64_t
> - leave irqs_disabled block where it originally was, outside of
> get_sample()
>
> Thanks for the v1 review Steve, have updated and retested.
>
> cheers,
> Colin
>
> kernel/trace/trace_hwlat.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index 2f7b94e98317..3fe274b84f1c 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -102,9 +102,9 @@ struct hwlat_sample {
> /* keep the global state somewhere. */
> static struct hwlat_data {
>
> - struct mutex lock; /* protect changes */
> + struct mutex lock; /* protect changes */
>
> - u64 count; /* total since reset */
> + atomic64_t count; /* total since reset */
>
> u64 sample_window; /* total sampling window (on+off) */
> u64 sample_width; /* active sampling portion of window */
> @@ -193,8 +193,7 @@ void trace_hwlat_callback(bool enter)
> * get_sample - sample the CPU TSC and look for likely hardware latencies
> *
> * Used to repeatedly capture the CPU TSC (or similar), looking for potential
> - * hardware-induced latency. Called with interrupts disabled and with
> - * hwlat_data.lock held.
> + * hardware-induced latency. Called with interrupts disabled.
> */
> static int get_sample(void)
> {
> @@ -204,6 +203,7 @@ static int get_sample(void)
> time_type start, t1, t2, last_t2;
> s64 diff, outer_diff, total, last_total = 0;
> u64 sample = 0;
> + u64 sample_width = READ_ONCE(hwlat_data.sample_width);
> u64 thresh = tracing_thresh;
> u64 outer_sample = 0;
> int ret = -1;
> @@ -267,7 +267,7 @@ static int get_sample(void)
> if (diff > sample)
> sample = diff; /* only want highest value */
>
> - } while (total <= hwlat_data.sample_width);
> + } while (total <= sample_width);
>
> barrier(); /* finish the above in the view for NMIs */
> trace_hwlat_callback_enabled = false;
> @@ -285,8 +285,7 @@ static int get_sample(void)
> if (kdata->nmi_total_ts)
> do_div(kdata->nmi_total_ts, NSEC_PER_USEC);
>
> - hwlat_data.count++;
> - s.seqnum = hwlat_data.count;
> + s.seqnum = atomic64_inc_return(&hwlat_data.count);
> s.duration = sample;
> s.outer_duration = outer_sample;
> s.nmi_total_ts = kdata->nmi_total_ts;
> @@ -832,7 +831,7 @@ static int hwlat_tracer_init(struct trace_array *tr)
>
> hwlat_trace = tr;
>
> - hwlat_data.count = 0;
> + atomic64_set(&hwlat_data.count, 0);
> tr->max_latency = 0;
> save_tracing_thresh = tracing_thresh;
>
>
> base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7
Powered by blists - more mailing lists