lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Oct 2021 21:52:29 -0700
From:   John Stultz <john.stultz@...aro.org>
To:     brookxu <brookxu.cn@...il.com>
Cc:     yanghui <yanghui.def@...edance.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Clocksource: Avoid misjudgment of clocksource

On Sat, Oct 9, 2021 at 7:04 AM brookxu <brookxu.cn@...il.com> wrote:
>
> hello
>
> John Stultz wrote on 2021/10/9 7:45:
> > On Fri, Oct 8, 2021 at 1:03 AM yanghui <yanghui.def@...edance.com> wrote:
> >>
> >> clocksource_watchdog is executed every WATCHDOG_INTERVAL(0.5s) by
> >> Timer. But sometimes system is very busy and the Timer cannot be
> >> executed in 0.5sec. For example,if clocksource_watchdog be executed
> >> after 10sec, the calculated value of abs(cs_nsec - wd_nsec) will
> >> be enlarged. Then the current clocksource will be misjudged as
> >> unstable. So we add conditions to prevent the clocksource from
> >> being misjudged.
> >>
> >> Signed-off-by: yanghui <yanghui.def@...edance.com>
> >> ---
> >>  kernel/time/clocksource.c | 6 +++++-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> >> index b8a14d2fb5ba..d535beadcbc8 100644
> >> --- a/kernel/time/clocksource.c
> >> +++ b/kernel/time/clocksource.c
> >> @@ -136,8 +136,10 @@ static void __clocksource_change_rating(struct clocksource *cs, int rating);
> >>
> >>  /*
> >>   * Interval: 0.5sec.
> >> + * MaxInterval: 1s.
> >>   */
> >>  #define WATCHDOG_INTERVAL (HZ >> 1)
> >> +#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
> >>
> >>  static void clocksource_watchdog_work(struct work_struct *work)
> >>  {
> >> @@ -404,7 +406,9 @@ static void clocksource_watchdog(struct timer_list *unused)
> >>
> >>                 /* Check the deviation from the watchdog clocksource. */
> >>                 md = cs->uncertainty_margin + watchdog->uncertainty_margin;
> >> -               if (abs(cs_nsec - wd_nsec) > md) {
> >> +               if ((abs(cs_nsec - wd_nsec) > md) &&
> >> +                       cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&
> >
> > Sorry, it's been awhile since I looked at this code, but why are you
> > bounding the clocksource delta here?
> > It seems like if the clocksource being watched was very wrong (with a
> > delta larger than the MAX_INTERVAL_NS), we'd want to throw it out.
> >
> >> +                       wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {
> >
> > Bounding the watchdog interval on the check does seem reasonable.
> > Though one may want to keep track that if we are seeing too many of
> > these delayed watchdog checks we provide some feedback via dmesg.
>
> For some fast timeout timers, such as acpi-timer, checking wd_nsec should not
> make much sense, because when wacthdog is called, the timer may overflow many
> times.

Indeed. But in that case we can't tell which way is up. This is what I
was fretting about when I said:
> So I do worry these watchdog robustness fixes are papering over a
> problem, pushing expectations closer to the edge of how far the system
> should tolerate bad behavior. Because at some point we'll fall off. :)

If the timer is delayed long enough for the watchdog to wrap, we're
way out of tolerable behavior. There's not much we can do because we
can't even tell what happened.

But in the case where the watchdog has not wrapped, I don't see a
major issue with trying to be a bit more robust in the face of just
slightly delayed timers.
(And yes, we can't really distinguish between slightly delayed and
watchdog-wrap-interval + slight delay, but in either case we can
probably skip disqualifying the clocksource as we know something seems
off)

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ