[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLVcc9nscuWT-qFH=JbatVL0c5AxH5B9y3qE1ekG=BZ0aA@mail.gmail.com>
Date: Mon, 18 Oct 2021 09:14:33 -0700
From: John Stultz <john.stultz@...aro.org>
To: brookxu <brookxu.cn@...il.com>
Cc: yanghui <yanghui.def@...edance.com>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Clocksource: Avoid misjudgment of clocksource
On Tue, Oct 12, 2021 at 1:06 AM brookxu <brookxu.cn@...il.com> wrote:
> John Stultz wrote on 2021/10/12 13:29:
> > On Mon, Oct 11, 2021 at 10:23 PM brookxu <brookxu.cn@...il.com> wrote:
> >> John Stultz wrote on 2021/10/12 12:52 下午:
> >>> On Sat, Oct 9, 2021 at 7:04 AM brookxu <brookxu.cn@...il.com> wrote:
> >> If we record the watchdog's start_time in clocksource_start_watchdog(), and then
> >> when we verify cycles in clocksource_watchdog(), check whether the clocksource
> >> watchdog is blocked. Due to MSB verification, if the blocked time is greater than
> >> half of the watchdog timer max_cycles, then we can safely ignore the current
> >> verification? Do you think this idea is okay?
> >
> > I can't say I totally understand the idea. Maybe could you clarify with a patch?
> >
>
> Sorry, it looks almost as follows:
>
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index b8a14d2..87f3b67 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -119,6 +119,7 @@
> static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
> static DEFINE_SPINLOCK(watchdog_lock);
> static int watchdog_running;
> +static unsigned long watchdog_start_time;
> static atomic_t watchdog_reset_pending;
>
> static inline void clocksource_watchdog_lock(unsigned long *flags)
> @@ -356,6 +357,7 @@ static void clocksource_watchdog(struct timer_list *unused)
> int next_cpu, reset_pending;
> int64_t wd_nsec, cs_nsec;
> struct clocksource *cs;
> + unsigned long max_jiffies;
> u32 md;
>
> spin_lock(&watchdog_lock);
> @@ -402,6 +404,10 @@ static void clocksource_watchdog(struct timer_list *unused)
> if (atomic_read(&watchdog_reset_pending))
> continue;
>
> + max_jiffies = nsecs_to_jiffies(cs->max_idle_ns);
> + if (time_is_before_jiffies(watchdog_start_time + max_jiffies))
> + continue;
> +
Sorry, what is the benefit of using jiffies here? Jiffies are
updated by counting the number of tick intervals on the current
clocksource.
This seems like circular logic, where we're trying to judge the
current clocksource by using something we derived from the current
clocksource.
That's why the watchdog clocksource is important, as it's supposed to
be a separate counter that is more reliable (but likely slower) then
the preferred clocksource.
So I'm not really sure how this helps.
The earlier patch by yanghui at least used the watchdog interval to
decide if the watchdog timer had expired late. Which seemed
reasonable, but I thought it might be helpful to add some sort of a
counter so if the case is happening repeatedly (timers constantly
being delayed) we have a better signal that the watchdog and current
clocksource are out of sync. Because again, timers are fired based on
the current clocksource. So constant delays likely mean things are
wrong.
thanks
-john
thanks
-john
Powered by blists - more mailing lists