netdev - Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200108105011.GY2827@hirez.programming.kicks-ass.net>
Date:   Wed, 8 Jan 2020 11:50:11 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Anchal Agarwal <anchalag@...zon.com>
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
        x86@...nel.org, boris.ostrovsky@...cle.com, jgross@...e.com,
        linux-pm@...r.kernel.org, linux-mm@...ck.org, kamatam@...zon.com,
        sstabellini@...nel.org, konrad.wilk@...cle.co,
        roger.pau@...rix.com, axboe@...nel.dk, davem@...emloft.net,
        rjw@...ysocki.net, len.brown@...el.com, pavel@....cz,
        eduval@...zon.com, sblbir@...zon.com,
        xen-devel@...ts.xenproject.org, vkuznets@...hat.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Woodhouse@...-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com,
        dwmw@...zon.co.uk, fllinden@...ozn.com
Subject: Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in
 hibernation

On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote:
> From: Eduardo Valentin <eduval@...zon.com>
> 
> System instability are seen during resume from hibernation when system
> is under heavy CPU load. This is due to the lack of update of sched
> clock data, and the scheduler would then think that heavy CPU hog
> tasks need more time in CPU, causing the system to freeze
> during the unfreezing of tasks. For example, threaded irqs,
> and kernel processes servicing network interface may be delayed
> for several tens of seconds, causing the system to be unreachable.

> The fix for this situation is to mark the sched clock as unstable
> as early as possible in the resume path, leaving it unstable
> for the duration of the resume process. This will force the
> scheduler to attempt to align the sched clock across CPUs using
> the delta with time of day, updating sched clock data. In a post
> hibernation event, we can then mark the sched clock as stable
> again, avoiding unnecessary syncs with time of day on systems
> in which TSC is reliable.

This makes no frigging sense what so bloody ever. If the clock is
stable, we don't care about sched_clock_data. When it is stable you get
a linear function of the TSC without complicated bits on.

When it is unstable, only then do we care about the sched_clock_data.

> Reviewed-by: Erik Quanstrom <quanstro@...zon.com>
> Reviewed-by: Frank van der Linden <fllinden@...zon.com>
> Reviewed-by: Balbir Singh <sblbir@...zon.com>
> Reviewed-by: Munehisa Kamata <kamatam@...zon.com>
> Tested-by: Anchal Agarwal <anchalag@...zon.com>
> Signed-off-by: Eduardo Valentin <eduval@...zon.com>
> ---

NAK, the code very much relies on never getting marked stable again
after it gets set to unstable.

> diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
> index 1152259a4ca0..374d40e5b1a2 100644
> --- a/kernel/sched/clock.c
> +++ b/kernel/sched/clock.c
> @@ -116,7 +116,7 @@ static void __scd_stamp(struct sched_clock_data *scd)
>  	scd->tick_raw = sched_clock();
>  }
>  
> -static void __set_sched_clock_stable(void)
> +void set_sched_clock_stable(void)
>  {
>  	struct sched_clock_data *scd;
>  
> @@ -236,7 +236,7 @@ static int __init sched_clock_init_late(void)
>  	smp_mb(); /* matches {set,clear}_sched_clock_stable() */
>  
>  	if (__sched_clock_stable_early)
> -		__set_sched_clock_stable();
> +		set_sched_clock_stable();
>  
>  	return 0;
>  }
> -- 
> 2.15.3.AMZN
>