lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1901292331380.1950@nanos.tec.linutronix.de>
Date:   Wed, 30 Jan 2019 00:02:37 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jan H. Schönherr <jan@...nhrr.de>
cc:     Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
        x86@...nel.org, Paul Menzel <pmenzel@...gen.mpg.de>,
        Thomas Lendacky <Thomas.Lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] cpu/hotplug: Unfreeze sibling CPU first on resume
 from S3

Jan,

On Tue, 29 Jan 2019, Jan H. Schönherr wrote:

> At least one system declares the TSC unstable after resume from S3,
> because the TSC is observed going backwards up to roughly 500 cycles
> every now and then, when bringing secondary CPUs back online.
> 
> The system in question is an AMD Ryzen Threadripper 2950X, microcode
> 0x800820b, on an ASRock Fatal1ty X399 Professional Gaming, BIOS P3.30.
> 
> This unexplained behavior goes away as soon as the sibling CPU of the
> boot CPU is brought back up. Hence, add a hack to restore the sibling
> CPU before all others on unfreeze. This keeps the TSC stable.

Uurgh, no. As you said that's a hack and I'm pretty sure that it just works
by chance. It makes the underlying wreckage not longer observable.

I'm pretty sure this is a BIOS bug and I'm really not going to make a
special case here just to accomodate with that particular broken
firmware. This would just set precedence for random ordering requests based
on DMI strings and other data to make it work on all kind of broken
motherboard/firmware/microcode combinations.

Surely nice detective work, but we really don't want to open this can of
worms.

Too bad that AMD does not have the TSC_ADJUST register. It would tell you
immediately what's wrong and the code we have for that would probably cure
the mess.

Sigh, it's more than 20 years by now that I'm complaining to both Intel and
AMD about the complete trainwreck they made out of TSC and it's still not
fixed. Though I still have the illusion that by the time I retire I get my
hands on a machine with a sane TSC implementation. Hope dies last ....

Oh well, enough ranted and with that I hand off the further proceedings to
Tom Lendacky who surely can give you more technical help than me in that
particular matter.

Thanks,

	tglx






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ