lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Oct 2023 16:03:47 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     John Stultz <jstultz@...gle.com>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Stephen Boyd <sboyd@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        x86@...nel.org
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

On Mon, Oct 16, 2023 at 11:47:55PM +0200, Thomas Gleixner wrote:
> On Mon, Oct 16 2023 at 10:46, John Stultz wrote:
> > On Fri, Oct 13, 2023 at 7:51 AM Tetsuo Handa
> > <penguin-kernel@...ove.sakura.ne.jp> wrote:
> >>
> >> Lockdep found that spin_lock(&watchdog_lock) from call_timer_fn()
> >> is not safe. Use spin_lock_irqsave(&watchdog_lock, flags) instead.
> >>
> >> [    0.378387] TSC synchronization [CPU#0 -> CPU#1]:
> >> [    0.378387] Measured 55060 cycles TSC warp between CPUs, turning off TSC clock.

[ . . . ]

> Something like the uncompiled/untested below should cure it for real. It
> really does not matter whether the TSC unstable event happens a bit
> later. The system is unhappy no matter what.

This does pass my acceptance tests:

Tested-by: Paul E. McKenney <paulmck@...nel.org>

> That said, this whole clocksource watchdog mess wants a proper
> overhaul. It has become a pile of warts and duct tape by now and after
> staring at it long enough there is no real reason to run it in a timer
> callback anymore. It just can move into delayed work and the whole
> locking problem can be reduced to the clocksource_mutex and some well
> thought out atomic operations to handle the mark unstable case. But
> that's a different story and not relevant for curing the problem at
> hand.

Moving the code to delayed work seems quite reasonable.

But Thomas, you do understand that the way things have been going for
the clocksource watchdog, pushing it out to delayed work will no doubt
add yet more hair on large busy systems, right?  Yeah, yeah, I know,
delayed work shouldn't be any worse than ksoftirqd.  The key word of
course being "shouldn't".  ;-)

							Thanx, Paul

> Thanks,
> 
>         tglx
> ---
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -15,6 +15,7 @@
>   * ( The serial nature of the boot logic and the CPU hotplug lock
>   *   protects against more than 2 CPUs entering this code. )
>   */
> +#include <linux/workqueue.h>
>  #include <linux/topology.h>
>  #include <linux/spinlock.h>
>  #include <linux/kernel.h>
> @@ -342,6 +343,13 @@ static inline unsigned int loop_timeout(
>  	return (cpumask_weight(topology_core_cpumask(cpu)) > 1) ? 2 : 20;
>  }
>  
> +static void tsc_sync_mark_tsc_unstable(struct work_struct *work)
> +{
> +	mark_tsc_unstable("check_tsc_sync_source failed");
> +}
> +
> +static DECLARE_WORK(tsc_sync_work, tsc_sync_mark_tsc_unstable);
> +
>  /*
>   * The freshly booted CPU initiates this via an async SMP function call.
>   */
> @@ -395,7 +403,7 @@ static void check_tsc_sync_source(void *
>  			"turning off TSC clock.\n", max_warp);
>  		if (random_warps)
>  			pr_warn("TSC warped randomly between CPUs\n");
> -		mark_tsc_unstable("check_tsc_sync_source failed");
> +		schedule_work(&tsc_sync_work);
>  	}
>  
>  	/*

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ