linux-kernel - Re: [PATCH] clocksource: disable irq when holding watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCpw+hEHNbtdAZR01HsHW_L1C0BXjZq21eXouQGNnYuUNQ@mail.gmail.com>
Date:   Mon, 16 Oct 2023 10:46:52 -0700
From:   John Stultz <jstultz@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock.

On Fri, Oct 13, 2023 at 7:51 AM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> Lockdep found that spin_lock(&watchdog_lock) from call_timer_fn()
> is not safe. Use spin_lock_irqsave(&watchdog_lock, flags) instead.
>
> [    0.378387] TSC synchronization [CPU#0 -> CPU#1]:
> [    0.378387] Measured 55060 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.378387] tsc: Marking TSC unstable due to check_tsc_sync_source failed
> [    0.926101]
> [    0.926387] ================================
> [    0.926387] WARNING: inconsistent lock state
> [    0.926387] 6.6.0-rc5-00192-g10a6e5feccb8 #2 Not tainted
> [    0.926387] --------------------------------
> [    0.926387] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [    0.926387] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [    0.926387] ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
> [    0.926387] {IN-HARDIRQ-W} state was registered at:
> [    0.926387]   lock_acquire+0xc1/0x2c0
> [    0.926387]   _raw_spin_lock_irqsave+0x3f/0x60
> [    0.926387]   clocksource_mark_unstable+0x1b/0x90
> [    0.926387]   mark_tsc_unstable+0x41/0x50
> [    0.926387]   check_tsc_sync_source+0x14f/0x180
> [    0.926387]   __flush_smp_call_function_queue+0x16f/0x560
> [    0.926387]   __sysvec_call_function_single+0x36/0x110
> [    0.926387]   sysvec_call_function_single+0x69/0x90
> [    0.926387]   asm_sysvec_call_function_single+0x1a/0x20
> [    0.926387]   default_idle+0xf/0x20
> [    0.926387]   default_idle_call+0x7f/0x180
> [    0.926387]   do_idle+0x1e1/0x220
> [    0.926387]   cpu_startup_entry+0x2a/0x30
> [    0.926387]   rest_init+0xf4/0x190
> [    0.926387]   arch_call_rest_init+0xe/0x30
> [    0.926387]   start_kernel+0x763/0x910
> [    0.926387]   x86_64_start_reservations+0x18/0x30
> [    0.926387]   x86_64_start_kernel+0xca/0xe0
> [    0.926387]   secondary_startup_64_no_verify+0x178/0x17b
> [    0.926387] irq event stamp: 138774
> [    0.926387] hardirqs last  enabled at (138774): [<ffffffff8bb99968>] _raw_spin_unlock_irq+0x28/0x50
> [    0.926387] hardirqs last disabled at (138773): [<ffffffff8bb99687>] _raw_spin_lock_irq+0x47/0x50
> [    0.926387] softirqs last  enabled at (138216): [<ffffffff8b0904ff>] irq_exit_rcu+0x7f/0xa0
> [    0.926387] softirqs last disabled at (138267): [<ffffffff8b0904ff>] irq_exit_rcu+0x7f/0xa0
> [    0.926387]
> [    0.926387] other info that might help us debug this:
> [    0.926387]  Possible unsafe locking scenario:
> [    0.926387]
> [    0.926387]        CPU0
> [    0.926387]        ----
> [    0.926387]   lock(watchdog_lock);
> [    0.926387]   <Interrupt>
> [    0.926387]     lock(watchdog_lock);
> [    0.926387]
> [    0.926387]  *** DEADLOCK ***
...
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index c108ed8a9804..4e8fc0a5ca9d 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -405,8 +405,9 @@ static void clocksource_watchdog(struct timer_list *unused)
>         enum wd_read_status read_ret;
>         unsigned long extra_wait = 0;
>         u32 md;
> +       unsigned long flags;
>
> -       spin_lock(&watchdog_lock);
> +       spin_lock_irqsave(&watchdog_lock, flags);
>         if (!watchdog_running)
>                 goto out;
>
> @@ -554,7 +555,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>                 add_timer_on(&watchdog_timer, next_cpu);
>         }
>  out:
> -       spin_unlock(&watchdog_lock);
> +       spin_unlock_irqrestore(&watchdog_lock, flags);
>  }
>
>  static inline void clocksource_start_watchdog(void)
> --

Hey! Thank you for the report and the patch!

At first glance this looks sane to me.  The only thing that gives me
pause is that I'm a little shocked this hasn't been caught earlier,
and I don't right off see recent changes that would have caused this
to trip. But Thomas may have more insight.

Tentatively:
Acked-by: John Stultz <jstultz@...gle.com>

thanks
-john