lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221103204910.GF5600@paulmck-ThinkPad-P17-Gen-1>
Date:   Thu, 3 Nov 2022 13:49:10 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Waiman Long <longman@...hat.com>
Cc:     tglx@...utronix.de, linux-kernel@...r.kernel.org,
        john.stultz@...aro.org, sboyd@...nel.org, corbet@....net,
        Mark.Rutland@....com, maz@...nel.org, kernel-team@...a.com,
        neeraju@...eaurora.org, ak@...ux.intel.com, feng.tang@...el.com,
        zhengjun.xing@...el.com, John Stultz <jstultz@...gle.com>
Subject: Re: [PATCH clocksource 2/2] clocksource: Exponential backoff for
 load-induced bogus watchdog reads

On Wed, Nov 02, 2022 at 10:28:05PM -0400, Waiman Long wrote:
> On 11/2/22 14:40, Paul E. McKenney wrote:
> > The clocksource watchdog will reject measurements that are excessively
> > delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
> > watchdog interval.  On an extremely busy system, this can result in a
> > console message being printed every two seconds.  This is excessively
> > noisy for a non-error condition.
> > 
> > Therefore, apply exponential backoff to these messages.  This exponential
> > backoff is capped at 1024 times the watchdog interval, which comes to
> > not quite one message per ten minutes.
> > 
> > Please note that the bogus watchdog reads that occur when the watchdog
> > interval is less than 0.125 seconds are still printed unconditionally
> > because these likely correspond to a serious error condition in the
> > timer code or hardware.
> > 
> > [ paulmck: Apply Feng Tang feedback. ]
> > 
> > Reported-by: Waiman Long <longman@...hat.com>
> > Reported-by: Feng Tang <feng.tang@...el.com>
> > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > Cc: John Stultz <jstultz@...gle.com>
> > Cc: Thomas Gleixner <tglx@...utronix.de>
> > Cc: Stephen Boyd <sboyd@...nel.org>
> > Cc: Feng Tang <feng.tang@...el.com>
> > Cc: Waiman Long <longman@...hat.com>
> > ---
> >   include/linux/clocksource.h |  4 ++++
> >   kernel/time/clocksource.c   | 31 +++++++++++++++++++++++++------
> >   2 files changed, 29 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> > index 1d42d4b173271..daac05aedf56a 100644
> > --- a/include/linux/clocksource.h
> > +++ b/include/linux/clocksource.h
> > @@ -125,6 +125,10 @@ struct clocksource {
> >   	struct list_head	wd_list;
> >   	u64			cs_last;
> >   	u64			wd_last;
> > +	u64			wd_last_bogus;
> > +	int			wd_bogus_shift;
> > +	unsigned long		wd_bogus_count;
> > +	unsigned long		wd_bogus_count_last;
> >   #endif
> >   	struct module		*owner;
> >   };
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 3f5317faf891f..de8047b6720f5 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
> >   		/* Check for bogus measurements. */
> >   		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
> > -		if (wd_nsec < (wdi >> 2)) {
> > -			/* This usually indicates broken timer code or hardware. */
> > -			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
> > +		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
> > +			bool needwarn = false;
> > +			u64 wd_lb;
> > +
> > +			cs->wd_bogus_count++;
> > +			if (!cs->wd_bogus_shift) {
> > +				needwarn = true;
> > +			} else {
> > +				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
> > +				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
> > +				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
> > +					needwarn = true;
> > +			}
> > +			if (needwarn) {
> > +				/* This can happen on busy systems, which can delay the watchdog. */
> > +				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count - cs->wd_bogus_count_last);
> > +				cs->wd_last_bogus = wdnow;
> > +				if (cs->wd_bogus_shift < 10)
> > +					cs->wd_bogus_shift++;
> > +				cs->wd_bogus_count_last = cs->wd_bogus_count;
> 
> I don't think you need to keep 2 counters to store the number of skipped
> warnings. How about just use a single wd_bogus_skip_count like
> 
> If (needwarn) {
>     pr_warn("... %lu additonal ...", ... cs->wd_bogus_skip_count);
>     cs->wd_bogus_skip_count = 0:
> } else {
>     cs->wd_bogus_skip_count++;
> }

Good point!  Perhaps I am too used to there being multiple entities taking
snapshots?  Anyway, does the updated version shown below look better?

							Thanx, Paul

------------------------------------------------------------------------

commit da44b8af99222ff8761a98ca8c00837a7d607d28
Author: Paul E. McKenney <paulmck@...nel.org>
Date:   Fri Oct 28 10:38:58 2022 -0700

    clocksource: Exponential backoff for load-induced bogus watchdog reads
    
    The clocksource watchdog will reject measurements that are excessively
    delayed, that is, by more than 1.5 seconds beyond the intended 0.5-second
    watchdog interval.  On an extremely busy system, this can result in a
    console message being printed every two seconds.  This is excessively
    noisy for a non-error condition.
    
    Therefore, apply exponential backoff to these messages.  This exponential
    backoff is capped at 1024 times the watchdog interval, which comes to
    not quite one message per ten minutes.
    
    Please note that the bogus watchdog reads that occur when the watchdog
    interval is less than 0.125 seconds are still printed unconditionally
    because these likely correspond to a serious error condition in the
    timer code or hardware.
    
    [ paulmck: Apply Feng Tang feedback. ]
    [ paulmck: Apply Waiman Long feedback. ]
    
    Reported-by: Waiman Long <longman@...hat.com>
    Reported-by: Feng Tang <feng.tang@...el.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
    Cc: John Stultz <jstultz@...gle.com>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: Stephen Boyd <sboyd@...nel.org>
    Cc: Feng Tang <feng.tang@...el.com>
    Cc: Waiman Long <longman@...hat.com>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 1d42d4b173271..23b73f2293d6d 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -125,6 +125,9 @@ struct clocksource {
 	struct list_head	wd_list;
 	u64			cs_last;
 	u64			wd_last;
+	u64			wd_last_bogus;
+	int			wd_bogus_shift;
+	unsigned long		wd_bogus_count;
 #endif
 	struct module		*owner;
 };
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 3f5317faf891f..1eefb56505350 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -442,14 +442,33 @@ static void clocksource_watchdog(struct timer_list *unused)
 
 		/* Check for bogus measurements. */
 		wdi = jiffies_to_nsecs(WATCHDOG_INTERVAL);
-		if (wd_nsec < (wdi >> 2)) {
-			/* This usually indicates broken timer code or hardware. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		if (wd_nsec > (wdi << 2) || cs_nsec > (wdi << 2)) {
+			bool needwarn = false;
+			u64 wd_lb;
+
+			cs->wd_bogus_count++;
+			if (!cs->wd_bogus_shift) {
+				needwarn = true;
+			} else {
+				delta = clocksource_delta(wdnow, cs->wd_last_bogus, watchdog->mask);
+				wd_lb = clocksource_cyc2ns(delta, watchdog->mult, watchdog->shift);
+				if ((1 << cs->wd_bogus_shift) * wdi <= wd_lb)
+					needwarn = true;
+			}
+			if (needwarn) {
+				/* This can happen on busy systems, which can delay the watchdog. */
+				pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval (%lu additional), probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL, cs->wd_bogus_count);
+				cs->wd_last_bogus = wdnow;
+				if (cs->wd_bogus_shift < 10)
+					cs->wd_bogus_shift++;
+				cs->wd_bogus_count = 0;
+			}
 			continue;
 		}
-		if (wd_nsec > (wdi << 2)) {
-			/* This can happen on busy systems, which can delay the watchdog. */
-			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced an excessive %lld ns during %d-jiffy time interval, probable CPU overutilization, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
+		/* Check too-short measurements second to handle wrap. */
+		if (wd_nsec < (wdi >> 2) || cs_nsec < (wdi >> 2)) {
+			/* This usually indicates broken timer code or hardware. */
+			pr_warn("timekeeping watchdog on CPU%d: Watchdog clocksource '%s' advanced only %lld ns during %d-jiffy time interval, skipping watchdog check.\n", smp_processor_id(), watchdog->name, wd_nsec, WATCHDOG_INTERVAL);
 			continue;
 		}
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ