linux-kernel - Re: [PATCH] locking/csd_lock: fix csd_lock_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ba64c1a2-c334-4e04-8641-7a7d2abaee5c@paulmck-laptop>
Date: Fri, 23 Aug 2024 05:06:38 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Zhu Qiyu <qiyuzhu2@....com>
Cc: neeraj.upadhyay@...nel.org, riel@...riel.com, leobras@...hat.com,
	tglx@...utronix.de, thorsten.blum@...lux.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] locking/csd_lock: fix csd_lock_wait_toolong() error
 warning

On Fri, Aug 23, 2024 at 08:00:46AM +0000, Zhu Qiyu wrote:
> From: "Zhu Qiyu" <qiyuzhu2@....com>
> 
> In AC cycle stress tests, it was found that the csd_lock_wait_toolong()
> did not reach the 5s threshold, but printed the warning message, the
> probability of occurrence is about 0.58%-0.87%.
> 
> This is due to the out-of-order execution of the CPU, which causes ts2
> to be used before it is initialized. The original value of ts2 on the
> stack is random, which may cause ts_delta = ts2 - *ts1 to be greater
> than 5s, thus triggering the printing.
> 
> The solution is to add a memory barrier after reading the ts2, which
> can effectively avoid this issue.
> 
> Signed-off-by: Zhu Qiyu <qiyuzhu2@....com>

Good catch!  But does the following commit help?

7ba82f5d831b ("locking/csd-lock: Switch from sched_clock() to ktime_get_mono_fast_ns()")

This is in the "dev" branch of the -rcu tree:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

This commit switches from sched_clock() to ktime_get_mono_fast_ns()
in csd_lock_wait_toolong() and __csd_lock_wait().  Because this code
runs with preemption disabled, both calls to ktime_get_mono_fast_ns()
will be from the same CPU.  Furthermore, neither call will ever be from
an NMI handler.  Therefore, the times returned from those two call sites
should never go backwards.

Please see the end of this email for the corresponding patch.

							Thanx, Paul

> ---
>  kernel/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index f25e20617b7e..e58678d424a4 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -247,6 +247,8 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
>  	}
>  
>  	ts2 = sched_clock();
> +	/* Avoid ts2 being used before it is initialized */
> +	mb();
>  	/* How long since we last checked for a stuck CSD lock.*/
>  	ts_delta = ts2 - *ts1;
>  	if (likely(ts_delta <= csd_lock_timeout_ns * (*nmessages + 1) *

------------------------------------------------------------------------

commit 7ba82f5d831bcddfa22b6b782c5e254c5580bc07
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Mon Aug 5 11:44:43 2024 -0700

    locking/csd-lock: Switch from sched_clock() to ktime_get_mono_fast_ns()
    
    Currently, the CONFIG_CSD_LOCK_WAIT_DEBUG code uses sched_clock()
    to check for excessive CSD-lock wait times.  This works, but does not
    guarantee monotonic timestamps.  Therefore, switch from sched_clock()
    to ktime_get_mono_fast_ns(), which does guarantee monotonic timestamps,
    at least in the absence of calls from NMI handlers, which are not involved
    in this code path.
    
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
    Cc: Neeraj Upadhyay <neeraj.upadhyay@...nel.org>
    Cc: Rik van Riel <riel@...riel.com>
    Cc: Leonardo Bras <leobras@...hat.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: "Peter Zijlstra (Intel)" <peterz@...radead.org>

diff --git a/kernel/smp.c b/kernel/smp.c
index f25e20617b7eb..27dc31a146a35 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -246,7 +246,7 @@ static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, in
 		return true;
 	}
 
-	ts2 = sched_clock();
+	ts2 = ktime_get_mono_fast_ns();
 	/* How long since we last checked for a stuck CSD lock.*/
 	ts_delta = ts2 - *ts1;
 	if (likely(ts_delta <= csd_lock_timeout_ns * (*nmessages + 1) *
@@ -321,7 +321,7 @@ static void __csd_lock_wait(call_single_data_t *csd)
 	int bug_id = 0;
 	u64 ts0, ts1;
 
-	ts1 = ts0 = sched_clock();
+	ts1 = ts0 = ktime_get_mono_fast_ns();
 	for (;;) {
 		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages))
 			break;