[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YKKFOoMVfAZtDWqE@alley>
Date: Mon, 17 May 2021 17:01:14 +0200
From: Petr Mladek <pmladek@...e.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Laurence Oberman <loberman@...hat.com>,
Vincent Whitchurch <vincent.whitchurch@...s.com>,
Michal Hocko <mhocko@...e.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 6/7] watchdog: Cleanup handling of false positives
On Sun 2021-05-16 19:46:21, Sergey Senozhatsky wrote:
> Hi,
>
> // This was never in my inbox, so sorry if I mess up the "Reply-to"
> // Original message: https://lore.kernel.org/lkml/20210311122130.6788-7-pmladek@suse.com/
>
>
> >@@ -375,7 +375,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > /* .. and repeat */
> > hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
> >
> > - /* Reset the interval when touched externally by a known slow code. */
> > + /*
> > + * If a virtual machine is stopped by the host it can look to
> > + * the watchdog like a soft lockup. Check to see if the host
> > + * stopped the vm before we process the timestamps.
> > + */
> > + kvm_check_and_clear_guest_paused();
> > +
> [..]
> >@@ -401,14 +405,6 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> > */
> > duration = is_softlockup(touch_ts, period_ts);
> > if (unlikely(duration)) {
> > - /*
> > - * If a virtual machine is stopped by the host it can look to
> > - * the watchdog like a soft lockup, check to see if the host
> > - * stopped the vm before we issue the warning
> > - */
> > - if (kvm_check_and_clear_guest_paused())
> > - return HRTIMER_RESTART;
>
> This looks racy to me. I believe kvm_check_and_clear_guest_paused()
> was in the right place.
>
> VCPU can be scheduled out/preepmpted any time at any point; and then
> guest VM (or even the entire system) can be suspended. When we resume
> the VM we continue from where we were preempted (from VCPU POW).
>
> So what the old code did
>
> watchdog_timer_fn()
> {
> ...
> <<!!>>
>
> // Suppose we are suspended here. When we are getting resumed
> // jiffies jump forward, which may look like a soft lockup.
> duration = is_softlockup(touch_ts, period_ts);
> if (unlikely(duration)) {
> // And this is where kvm_check_and_clear_guest_paused()
> // jumps in. We know already that jiffies have jumped,
> // we don't know if jiffies jumped because the VM was
> // suspended. And this is what we figure out here and
> // bail out
> if (kvm_check_and_clear_guest_paused())
> return HRTIMER_RESTART;
> }
> }
>
> The new code does the following
>
> watchdog_timer_fn()
> {
> ...
> kvm_check_and_clear_guest_paused(); // PVCLOCK_GUEST_STOPPED is not set
>
> <<!!>>
>
> // Suppose the VM got suspended at this point. PVCLOCK_GUEST_STOPPED
> // is set, but we don't check it. jiffies will jump and this will look
> // like a lockup, but we don't check if jiffies jumped because the VM
> // was suspended
> duration = is_softlockup(touch_ts, period_ts);
> if (unlikely(duration)) {
> // report the lockup and perhaps panic the system,
> // depending on the configuration
> }
> }
>
> What am I missing?
Great catch! You have a point.
Well, I think that the entire code is racy. touch_ts and period_ts are
set by:
unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);
unsigned long period_ts = __this_cpu_read(watchdog_report_ts);
They are neither volatile not there are any barriers. It means that
period_ts might be re-read in these two checks:
/* Reset the interval when touched by known problematic code. */
if (period_ts == SOFTLOCKUP_DELAY_REPORT) {
update_report_ts();
return HRTIMER_RESTART;
}
and
duration = is_softlockup(touch_ts, period_ts);
where:
static int is_softlockup(unsigned long touch_ts, unsigned long period_ts)
{
unsigned long now = get_timestamp();
if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
/* Warn about unreasonable delays. */
if (time_after(now, period_ts + get_softlockup_thresh()))
return now - touch_ts;
}
return 0;
}
Now, if the watchdog is touched from NMI. period_ts might be
SOFTLOCKUP_DELAY_REPORT. It is ULONG_MAX.
As a result period_ts + get_softlockup_thresh() would overflow and
we could report softlockup even when there is none.
I probably does not happen because the per-CPU variable is read only
once. And watchdogs are not typically touched from NMI. Except that
show_regs() actually touch the watchdog.
That said. This patch most likely made things worse and should be
reverted. But even better solution would be to remove this race.
I mean to make the code safe and sane at the same time.
Best Regards,
Petr
Powered by blists - more mailing lists