linux-kernel - Re: [PATCH v2] x86/tsc: Allow quick PIT calibration despite interruptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190215093618.GA84754@gmail.com>
Date:   Fri, 15 Feb 2019 10:36:18 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Jan H. Schönherr <jan@...nhrr.de>
Cc:     Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
        Paul Menzel <pmenzel@...gen.mpg.de>,
        Thomas Lendacky <Thomas.Lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86/tsc: Allow quick PIT calibration despite
 interruptions


* Jan H. Schönherr <jan@...nhrr.de> wrote:

> Some systems experience regular interruptions (60 Hz SMI?), that prevent
> the quick PIT calibration from succeeding: individual interruptions can be
> so long, that the PIT MSB is observed to decrement by 2 or 3 instead of 1.
> The existing code cannot recover from this.
> 
> The system in question is an AMD Ryzen Threadripper 2950X, microcode
> 0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
> 
> Change the code to handle (almost) arbitrary interruptions, as long
> as they happen only once in a while and they do not take too long.
> Specifically, also cover an interruption during the very first reads.
> 
> Signed-off-by: Jan H. Schönherr <jan@...nhrr.de>
> ---
> 
> v2:
> - Dropped the other hacky patch for the time being.
> - Fixed the early exit check.
> - Hopefully fixed all inaccurate math in v1.
> - Extended comments.
> 
>  arch/x86/kernel/tsc.c | 91 +++++++++++++++++++++++++++----------------
>  1 file changed, 57 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index e9f777bfed40..aced427371f7 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -485,7 +485,7 @@ static inline int pit_verify_msb(unsigned char val)
>  static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *deltap)
>  {
>  	int count;
> -	u64 tsc = 0, prev_tsc = 0;
> +	u64 tsc = get_cycles(), prev_tsc = 0;
>  
>  	for (count = 0; count < 50000; count++) {
>  		if (!pit_verify_msb(val))
> @@ -500,7 +500,7 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
>  	 * We require _some_ success, but the quality control
>  	 * will be based on the error terms on the TSC values.
>  	 */
> -	return count > 5;
> +	return count > 0 && pit_verify_msb(val - 1);
>  }
>  
>  /*
> @@ -515,7 +515,8 @@ static inline int pit_expect_msb(unsigned char val, u64 *tscp, unsigned long *de
>  static unsigned long quick_pit_calibrate(void)
>  {
>  	int i;
> -	u64 tsc, delta;
> +	u64 tsc = 0, delta;
> +	unsigned char start;
>  	unsigned long d1, d2;
>  
>  	if (!has_legacy_pic())
> @@ -547,43 +548,65 @@ static unsigned long quick_pit_calibrate(void)
>  	 */
>  	pit_verify_msb(0);
>  
> -	if (pit_expect_msb(0xff, &tsc, &d1)) {
> -		for (i = 1; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
> -			if (!pit_expect_msb(0xff-i, &delta, &d2))
> -				break;
> -
> -			delta -= tsc;
> -
> -			/*
> -			 * Extrapolate the error and fail fast if the error will
> -			 * never be below 500 ppm.
> -			 */
> -			if (i == 1 &&
> -			    d1 + d2 >= (delta * MAX_QUICK_PIT_ITERATIONS) >> 11)
> -				return 0;
> -
> -			/*
> -			 * Iterate until the error is less than 500 ppm
> -			 */
> -			if (d1+d2 >= delta >> 11)
> -				continue;
> -
> -			/*
> -			 * Check the PIT one more time to verify that
> -			 * all TSC reads were stable wrt the PIT.
> -			 *
> -			 * This also guarantees serialization of the
> -			 * last cycle read ('d2') in pit_expect_msb.
> -			 */
> -			if (!pit_verify_msb(0xfe - i))
> -				break;
> -			goto success;
> +	/*
> +	 * Reading the PIT may fail or experience unexpected delays (due to
> +	 * SMIs, for example). Assuming, that these underlying interruptions
> +	 * happen only once in a while, we wait for two successful reads.
> +	 * Of these, we assume that the better one was not delayed and use
> +	 * it as the base for later calculations.
> +	 */
> +	for (i = 0; i <= MAX_QUICK_PIT_ITERATIONS; i++) {
> +		if (!pit_expect_msb(0xff - i, &delta, &d2))
> +			continue;
> +
> +		if (!tsc) {
> +			/* first success */
> +			start = i;
> +			tsc = delta;
> +			d1 = d2;
> +			continue;
>  		}


The logic looks mostly good to me, but do we really want to use 'delta' 
as an implicit success-counter here? In principle 'delta' could end up 
being 0 due to some TSC borkage, and we'd interpret that as "first 
success", which it clearly isn't.

The end result will still be a 'failure', but why not use a proper 
separate variable to count attempts and make the code easier to read and 
failure scenarios more predictable?

Thanks,

	Ingo