linux-kernel - Re: [PATCH clocksource] Reject bogus watchdog clocksource measurements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221028175245.GN5600@paulmck-ThinkPad-P17-Gen-1>
Date:   Fri, 28 Oct 2022 10:52:45 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Feng Tang <feng.tang@...el.com>
Cc:     linux-kernel@...r.kernel.org, clm@...a.com, jstultz@...gle.com,
        tglx@...utronix.de, sboyd@...nel.org, longman@...hat.com
Subject: Re: [PATCH clocksource] Reject bogus watchdog clocksource
 measurements

On Fri, Oct 21, 2022 at 08:46:32AM +0800, Feng Tang wrote:
> On Thu, Oct 20, 2022 at 07:09:44AM -0700, Paul E. McKenney wrote:
> [...]
> > > >  static void clocksource_watchdog(struct timer_list *unused)
> > > >  {
> > > > -	u64 csnow, wdnow, cslast, wdlast, delta;
> > > > +	u64 csnow, wdnow, cslast, wdlast, delta, wdi;
> > > >  	int next_cpu, reset_pending;
> > > >  	int64_t wd_nsec, cs_nsec;
> > > >  	struct clocksource *cs;
> > > > @@ -440,6 +440,17 @@ static void clocksource_watchdog(struct timer_list *unused)
> > > >  		if (atomic_read(&watchdog_reset_pending))
> > > >  			continue;
> > > >  
> > > > +		/* Check for bogus measurements. */
> > > > +		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > > > +		if (wd_nsec < (wdi >> 2)) {
> > > > +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > > > +			continue;
> > > > +		}
> > > 
> > > If this happens (500ms timer happens only after less than 125ms),
> > > there is some severe problem with timer/interrupt system. 
> > 
> > Should I add ", suspect timer/interrupt bug" just after "jiffy time
> > interval"?  Or would a comment before that pr_warn() work better for you?
> 
> Both are fine for me.
> 
> > > > +		if (wd_nsec > (wdi << 2)) {
> > > > +			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > > > +			continue;
> > > > +		}
> > > 
> > > I agree with Waiman that some rate limiting may be needed. As there
> > > were reports of hundreds of seconds of delay, 2 seconds delay could
> > > easily happen if a system is too busy or misbehave to trigger this
> > > problem.
> > 
> > Good points, thank you both!
> > 
> > Left to myself, I would use a capped power-of-two backoff that was reset
> > any time that the interval was within bounds.  Maybe a cap of 10 minutes?
> 
> Yep, that's more informative. Thanks!

And here is the resulting patch.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit adc0512cf34fe7f0d73966d59644b826ee526742
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Fri Oct 28 10:38:58 2022 -0700

    clocksource: Exponential backoff for load-induced bogus watchdog reads
    
    The clocksource watchdog will reject measurements that are excessively
    delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
    watchdog interval.  On an extremely busy system, this can result in a
    console message being printed every two seconds.  This is excessively
    noisy for a non-error condition.
    
    Therefore, apply exponential backoff to these messages.  This exponential
    backoff is capped at 1024 times the watchdog interval, which comes to
    not quite one message per ten minutes.
    
    Please note that the bogus watchdog reads that occur when the watchdog
    interval is less than 0.125 seconds are still printed unconditionally
    because these likely correspond to a serious error condition in the
    timer code or hardware.
    
    Reported-by: Waiman Long <longman@...hat.com>
    Reported-by: Feng Tang <feng.tang@...el.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
    Cc: John Stultz <jstultz@...gle.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: Stephen Boyd <sboyd@...nel.org>
    Cc: Feng Tang <feng.tang@...el.com>
    Cc: Waiman Long <longman@...hat.com>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1d42d4b173271..daac05aedf56a 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,6 +125,10 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	u64			wd_last_bogus;
+	int			wd_bogus_shift;
+	unsigned long		wd_bogus_count;
+	unsigned long		wd_bogus_count_last;
 #endif
 	struct module		*owner;
 };
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f5317faf891f..6537ffa02e445 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -448,8 +448,26 @@ static void clocksource_watchdog(struct timer_list *unused)
 			continue;
 		}
 		if (wd_nsec > (wdi << 2)) {
-			/* This can happen on busy systems, which can delay the watchdog. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+			bool needwarn = false;
+			u64 wd_lb;
+
+			cs->wd_bogus_count++;
+			if (!cs->wd_bogus_shift) {
+				needwarn = true;
+			} else {
+				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
+				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
+					needwarn = true;
+			}
+			if (needwarn) {
+				/* This can happen on busy systems, which can delay the watchdog. */
+				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
+				cs->wd_last_bogus = wdnow;
+				if (cs->wd_bogus_shift < 10)
+					cs->wd_bogus_shift++;
+				cs->wd_bogus_count_last = cs->wd_bogus_count;
+			}
 			continue;
 		}