linux-kernel - Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180727075100.GZ2494@hirez.programming.kicks-ass.net>
Date:   Fri, 27 Jul 2018 09:51:00 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Eduardo Valentin <eduval@...zon.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dou Liyang <douly.fnst@...fujitsu.com>,
        Len Brown <len.brown@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        "mike.travis@....com" <mike.travis@....com>,
        Rajvi Jingar <rajvi.jingar@...el.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] x86: tsc: avoid system instability in hibernation

On Wed, Jul 25, 2018 at 11:18:46AM -0700, Eduardo Valentin wrote:
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data, and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.

> +static int tsc_pm_notifier(struct notifier_block *notifier,
> +			   unsigned long pm_event, void *unused)
> +{
> +	switch (pm_event) {
> +	case PM_HIBERNATION_PREPARE:
> +		clear_sched_clock_stable();
> +		break;
> +	case PM_POST_HIBERNATION:
> +		/* Set back to the default */
> +		if (!check_tsc_unstable())
> +			set_sched_clock_stable();
> +		break;
> +	}

I've not looked at this in detail yet, but this is an absolute no go,
not going to happen, full stop.

If we _ever_ mark the thing unstable, that's it, the end. Allowing it to
go back to stable is a source of utter fail.