[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1706211645340.2328@nanos>
Date:   Wed, 21 Jun 2017 17:12:06 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Kan Liang <kan.liang@...el.com>
cc:     linux-kernel@...r.kernel.org, dzickus@...hat.com, mingo@...nel.org,
        akpm@...ux-foundation.org, babu.moger@...cle.com,
        atomlin@...hat.com, prarit@...hat.com,
        torvalds@...ux-foundation.org, peterz@...radead.org,
        eranian@...gle.com, acme@...hat.com, ak@...ux.intel.com,
        stable@...r.kernel.org
Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups
On Wed, 21 Jun 2017, kan.liang@...el.com wrote:
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> +/*
> + * The NMI watchdog relies on PERF_COUNT_HW_CPU_CYCLES event, which
> + * can tick faster than the measured CPU Frequency due to Turbo mode.
> + * That can lead to spurious timeouts.
> + * To workaround the issue, extending the period by 3 times.
> + */
>  u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  {
> -	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> +	return (u64)(cpu_khz) * 1000 * watchdog_thresh * 3;
The maximum turbo frequency of any given machine can be retrieved.
So why don't you simply take that ratio into account and apply it for the
machines which have those insane turbo loaders? That's not a huge effort,
can be easily backported and does not inflict this unconditially.
So what you want is:
	return get_max_turbo_khz() * 1000 * watchdog_thresh;
Where get_max_turbo_khz() by default returns cpu_khz for non turbo
motors.
And instead of silently doing this it should emit a info into dmesg:
   	u64 period, max_khz = get_max_turbo_khz();
	static int once;
	period = max_khz * 1000 * watchdog_thresh;
	if (max_khz != cpu_khz && !once) {
		unsigned int msec = period / cpu_khz;
		once = 1;
		pr_info("Adjusted watchdog threshold to %u.%04u sec\n",
			msec / 1000, msec % 1000);
	}
	return period;
Hmm?
Thanks,
	tglx
Powered by blists - more mailing lists
 
