linux-kernel - Re: [PATCH 2/2] selftests/x86/fsgsbase: Default to trying to run the test repeatedly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190211084916.GB62722@gmail.com>
Date:   Mon, 11 Feb 2019 09:49:16 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Mark Brown <broonie@...nel.org>
Cc:     Shuah Khan <shuah@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...capital.net>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        linux-kselftest@...r.kernel.org, Dan Rue <dan.rue@...aro.org>
Subject: Re: [PATCH 2/2] selftests/x86/fsgsbase: Default to trying to run the
 test repeatedly


* Mark Brown <broonie@...nel.org> wrote:

> In automated testing it has been found that on many systems the fsgsbase
> test fails intermittently.  This was reported and discussed a while
> back:
> 
>     https://lore.kernel.org/lkml/20180126153631.ha7yc33fj5uhitjo@xps/
> 
> with the analysis concluding that this is a hardware issue affecting a
> subset of systems but no fix has been merged as yet.  As well as the
> actual problem found by testing the intermittent test failure is causing
> issues for the people doing the automated testing due to the noise.
> 
> In order to make the testing stable modify the test program to iterate
> through the test repeatedly, choosing 5000 iterations based on prior
> reports and local testing.  This unfortunately greatly increases the
> execution time for the selftests when things succeed which isn't great,
> in my local tests on a range of systems it pushes the execution time up
> to approximately a minute when no failures are encountered.
> 
> Reported-by: Dan Rue <dan.rue@...aro.org>
> Signed-off-by: Mark Brown <broonie@...nel.org>
> ---
>  tools/testing/selftests/x86/fsgsbase.c | 27 +++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
> index 6cda6daa1f8c..83410749ff1f 100644
> --- a/tools/testing/selftests/x86/fsgsbase.c
> +++ b/tools/testing/selftests/x86/fsgsbase.c
> @@ -379,7 +379,7 @@ static void test_unexpected_base(void)
>  	}
>  }
>  
> -int main()
> +int test()
>  {
>  	pthread_t thread;
>  
> @@ -437,3 +437,28 @@ int main()
>  
>  	return nerrs == 0 ? 0 : 1;
>  }
> +
> +int main()
> +{
> +	int tries = 5000;
> +	int i;
> +
> +	if (tries > 1)
> +		quiet = true;
> +
> +	for (i = 0; i < tries; i++) {
> +		if (test() != 0)
> +			break;
> +	}
> +
> +	if (quiet) {
> +		if (nerrs) {
> +			printf("[FAIL] %d errors detected in %d tries\n",
> +				nerrs, i + 1);
> +		} else {
> +			printf("[PASS] %d runs succeeded\n", i);
> +		}
> +	}
> +
> +	return nerrs == 0 ? 0 : 1;
> +}

So this isn't very user-friendly either, previously it would run a 
testcase and immediately provide output.

Now it's just starting and 'hanging':

  galatea:~/linux/linux/tools/testing/selftests/x86> ./fsgsbase_64 

I got bored and Ctrl-C-ed it after ~30 seconds.

How long is this supposed to run, and why isn't the user informed?

Also, testcases should really be short, so I think a better approach 
would be to thread the test-case and start an instance on every CPU. That 
should also excercise SMP bugs, if any.

Thanks,

	Ingo