lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1709260917520.1941@nanos>
Date:   Tue, 26 Sep 2017 09:28:23 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Mike Travis <mike.travis@....com>
cc:     Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Bin Gao <bin.gao@...ux.intel.com>,
        Prarit Bhargava <prarit@...hat.com>,
        Dimitri Sivanich <dimitri.sivanich@....com>,
        Andrew Banman <andrew.banman@....com>,
        Russ Anderson <russ.anderson@....com>,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being
 non-null is valid

On Mon, 25 Sep 2017, Mike Travis wrote:
> On 9/25/2017 11:10 AM, Thomas Gleixner wrote:
> > The reason why I put it there in the first place was to make the TSC
> > deadline timer work on a whole range of systems. It turned out that our
> > 'fix' was not enough so we changed that later to disable the deadline timer
> > completely on affected systems when the firmware does not contain a fix for
> > it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
> > on the boot CPU. So rather than special casing this for UV we should just
> > remove that requirement and leave the boot value as is.
> 
> Okay, I could do that but as I mentioned I'm not comfortable changing code
> that I cannot personally verify is correct.  I'm assuming you only mean
> removing the part where socket 0 should have a TSC_ADJUST value of 0?  And the
> part mentioned below that all cpu threads on a socket should have the same
> adjust values should not be changed?

Right, only the restriction of TSC_ADJUST == 0 on the boot CPU can be
removed. If the TSC_ADJUST values in a package differ, then BIOS did
something really stupid because the underlying counter is the same for all
threads in a package and different TSC_ADJUST values will result in TSC out
of sync being detected.

> > > Our BIOS team did make a change to conform to the "TSC_ADJUST should be
> > > the
> > > same on all cpu threads on a single socket" requirement, so we were able
> > > to
> > > pass that part of the TSC validation functions.  (Prior to this, the TSC's
> > > were synced by writing directly to the TSC MSR and natural delays in the
> > > processor firmware caused the slight differences in the TSC ADJUST
> > > values.)
> > 
> > Right. TSC_ADJUST is there for a reason.
> 
> Well, there is the problem with bit 63 (sign bit) in the TSC ADJUST MSR. So
> it's almost there... :)

What's the problem with bit 63?

> > But back to my question about ART. You might talk to your BIOS/HW folks
> > about that and eventually disable the ART related functionality in the
> > kernel on UV.
> 
> Once again, I'm not sure what code you are talking about.  I talked to the
> BIOS engineer specifically working on TSC related items and he was also not
> familiar with the Intel ART spec.  I could go farther up the chain to the
> hardware designers but right now this would be extremely low on their priority
> queue.
> 
> Perhaps if you were more specific in what code should be disabled?  We would
> like to keep as much as the self check code as possible as there are still
> some remotely possible failure cases.  And removing the self checks would have
> a negative impact on reliability.

detect_art() -> X86_FEATURE_ART

X86_FEATURE_ART enables the hardware assisted PTP <-> TSC correlation in
network cards and is used for network synchronized audio stuff. It's
probably a non issue today but TSN is becoming more widespread so I expect
more users in the foreseeable future. We can just drop out in detect_art()
if the system is UV so X86_FEATURE_ART does not get set and all related
functionality is disabled automagically.

> I actually did test using the CPU flags, "TSC reliable" and no "TSC ADJUST" to
> prevent the kernel from overwriting the already correct TSC ADJUST values.
> But that also turned off the clock source watchdog and having the
> check_tsc_sync_target() test run was a good verification step.  Our goal
> was/is to conform as much as possible to the checks and the only unfixable
> code was the assumption that socket 0 should have a TSC_ADJUST value of 0.

Which can be removed. If you don't want to do it. I'm happy to write the
patch^Wchangelog myself.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ