linux-kernel - Re: [patch 01/11] hrtimer: Add a mechanism to catch runaway timers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+aOFxbm6qny_FO0rz0_iSZw04W3Sdqe-zPtGYdvNNzHbw@mail.gmail.com>
Date:   Fri, 24 Sep 2021 10:16:15 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [patch 01/11] hrtimer: Add a mechanism to catch runaway timers

On Thu, 23 Sept 2021 at 18:04, Thomas Gleixner <tglx@...utronix.de> wrote:
>
> A recent report from syzbot unearthed a problem with self rearming timers
> which fire late and try to catch up to now with a short period. That causes
> the timer to be rearmed in the past until it eventually catches up with
> now. If that rearming happens from the timer callback the hard or soft
> interrupt expiry loop can run for a long time with either interrupts or
> bottom halves disabled which causes RCU stalls and other lockups.
>
> There is no safety net to stop or at least identify such runaway timers.
>
> Detection is trivial. Cache the pointer to the last expired timer. The next
> invocation from the same loop compares the pointer with the next expiring
> hrtimer pointer and if they match 10 times in a row (in the same hard or
> soft interrupt expiry instance) then it's reasonable to declare it as a
> runaway.
>
> In that case emit a warning and skip the callback invocation which stops
> the misbehaving timer right there.
>
> It's obviously incomplete, but it's definitely better than nothing and would
> have caught the reported issue in mac80211_hwsim.
>
> Suggested-by: Dmitry Vyukov <dvyukov@...gle.com>
> Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
> ---
>  kernel/time/hrtimer.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1714,6 +1714,13 @@ static void __run_hrtimer(struct hrtimer
>         base->running = NULL;
>  }
>
> +static void hrtimer_del_runaway(struct hrtimer_clock_base *base,
> +                               struct hrtimer *timer)
> +{
> +       __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE, 0);
> +       pr_warn("Runaway hrtimer %p %ps stopped\n", timer, timer->function);

Thanks for implementing this, Thomas.
Please use some standard kernel bug reporting facility here, e.g. WARN
I think will be appropriate. The ad-hoc format won't be recognized by
any testing system.
Otherwise looks good to me.

> +}
> +
>  static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now,
>                                  unsigned long flags, unsigned int active_mask)
>  {
> @@ -1722,6 +1729,8 @@ static void __hrtimer_run_queues(struct
>
>         for_each_active_base(base, cpu_base, active) {
>                 struct timerqueue_node *node;
> +               struct hrtimer *last = NULL;
> +               unsigned int cnt = 0;
>                 ktime_t basenow;
>
>                 basenow = ktime_add(now, base->offset);
> @@ -1732,6 +1741,22 @@ static void __hrtimer_run_queues(struct
>                         timer = container_of(node, struct hrtimer, node);
>
>                         /*
> +                        * Catch timers which rearm themself with a expiry
> +                        * time in the past over and over which makes this
> +                        * loop run forever.
> +                        */
> +                       if (IS_ENABLED(CONFIG_DEBUG_OBJECTS_TIMERS)) {
> +                               if (unlikely(last == timer)) {
> +                                       if (++cnt == 10) {
> +                                               hrtimer_del_runaway(base, timer);
> +                                               continue;
> +                                       }
> +                               }
> +                               last = timer;
> +                               cnt = 0;
> +                       }
> +
> +                       /*
>                          * The immediate goal for using the softexpires is
>                          * minimizing wakeups, not running timers at the
>                          * earliest interrupt after their soft expiration.
>