[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61381153-634e-489b-848f-7077ce46049a@bytedance.com>
Date: Tue, 19 Oct 2021 12:14:07 +0800
From: yanghui <yanghui.def@...edance.com>
To: John Stultz <john.stultz@...aro.org>,
brookxu <brookxu.cn@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>,
lkml <linux-kernel@...r.kernel.org>, shli@...com
Subject: [PATCH] Clocksource: Avoid misjudgment of clocksource
在 2021/10/19 上午12:14, John Stultz 写道:
> On Tue, Oct 12, 2021 at 1:06 AM brookxu <brookxu.cn@...il.com> wrote:
>> John Stultz wrote on 2021/10/12 13:29:
>>> On Mon, Oct 11, 2021 at 10:23 PM brookxu <brookxu.cn@...il.com> wrote:
>>>> John Stultz wrote on 2021/10/12 12:52 下午:
>>>>> On Sat, Oct 9, 2021 at 7:04 AM brookxu <brookxu.cn@...il.com> wrote:
>>>> If we record the watchdog's start_time in clocksource_start_watchdog(), and then
>>>> when we verify cycles in clocksource_watchdog(), check whether the clocksource
>>>> watchdog is blocked. Due to MSB verification, if the blocked time is greater than
>>>> half of the watchdog timer max_cycles, then we can safely ignore the current
>>>> verification? Do you think this idea is okay?
>>>
>>> I can't say I totally understand the idea. Maybe could you clarify with a patch?
>>>
>>
>> Sorry, it looks almost as follows:
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index b8a14d2..87f3b67 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -119,6 +119,7 @@
>> static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
>> static DEFINE_SPINLOCK(watchdog_lock);
>> static int watchdog_running;
>> +static unsigned long watchdog_start_time;
>> static atomic_t watchdog_reset_pending;
>>
>> static inline void clocksource_watchdog_lock(unsigned long *flags)
>> @@ -356,6 +357,7 @@ static void clocksource_watchdog(struct timer_list *unused)
>> int next_cpu, reset_pending;
>> int64_t wd_nsec, cs_nsec;
>> struct clocksource *cs;
>> + unsigned long max_jiffies;
>> u32 md;
>>
>> spin_lock(&watchdog_lock);
>> @@ -402,6 +404,10 @@ static void clocksource_watchdog(struct timer_list *unused)
>> if (atomic_read(&watchdog_reset_pending))
>> continue;
>>
>> + max_jiffies = nsecs_to_jiffies(cs->max_idle_ns);
>> + if (time_is_before_jiffies(watchdog_start_time + max_jiffies))
>> + continue;
>> +
>
> Sorry, what is the benefit of using jiffies here? Jiffies are
> updated by counting the number of tick intervals on the current
> clocksource.
>
> This seems like circular logic, where we're trying to judge the
> current clocksource by using something we derived from the current
> clocksource.
> That's why the watchdog clocksource is important, as it's supposed to
> be a separate counter that is more reliable (but likely slower) then
> the preferred clocksource.
>
> So I'm not really sure how this helps.
>
> The earlier patch by yanghui at least used the watchdog interval to
> decide if the watchdog timer had expired late. Which seemed
> reasonable, but I thought it might be helpful to add some sort of a
> counter so if the case is happening repeatedly (timers constantly
> being delayed) we have a better signal that the watchdog and current
> clocksource are out of sync. Because again, timers are fired based on
I think only have a signal ls not enough. we need to prevent
clocksource from being incorrectly switched.
The Timer callback function clocksource_watchdog() is executed in the
context of softirq(run_timer_softirq()). So if softirq is disabled for
long time(One situation is long time softlockup), clocksource_watchdog()
will be delay executed.
> the current clocksource. So constant delays likely mean things are
> wrong.
> > thanks
> -john
>
> thanks
> -john
>
I think it will be better to add this to my patch:
/*
* Interval: 0.5sec.
- * MaxInterval: 1s.
+ * MaxInterval: 20s.
*/
#define WATCHDOG_INTERVAL (HZ >> 1)
-#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
+#define WATCHDOG_MAX_INTERVAL_NS (20 * NSEC_PER_SEC)
thanks
Powered by blists - more mailing lists