linux-kernel - Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9175f628-18e7-cae9-04db-cd284c4056c2@hpe.com>
Date:   Mon, 25 Sep 2017 17:13:25 -0700
From:   Mike Travis <mike.travis@....com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Bin Gao <bin.gao@...ux.intel.com>,
        Prarit Bhargava <prarit@...hat.com>,
        Dimitri Sivanich <dimitri.sivanich@....com>,
        Andrew Banman <andrew.banman@....com>,
        Russ Anderson <russ.anderson@....com>,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being
 non-null is valid



On 9/25/2017 11:10 AM, Thomas Gleixner wrote:
> On Mon, 25 Sep 2017, Mike Travis wrote:
>> On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
>>> Aside of that I really do not like this kind of special case hackery. The
>>> real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
>>> at all. In principle we don't anymore now that we handle that TSC deadline
>>> timer wreckage cleanly.
>>
>> I am hesitant to make such a global change as it appears the author
>> intentionally added this.  It not only caused our internal tsc sync tests to
>> become totally out of whack, it also generated an avalanche of error messages
>> to the system console (>3000 messages for a 32 socket Skylake system).  And I
>> don't have the means to test how major changes to the TSC adjust functions
>> will affect standard whitebox PC's.
> 
> The reason why I put it there in the first place was to make the TSC
> deadline timer work on a whole range of systems. It turned out that our
> 'fix' was not enough so we changed that later to disable the deadline timer
> completely on affected systems when the firmware does not contain a fix for
> it. So there is no real technical reason anymore to enforce TSC_ADJUST == 0
> on the boot CPU. So rather than special casing this for UV we should just
> remove that requirement and leave the boot value as is.

Okay, I could do that but as I mentioned I'm not comfortable changing 
code that I cannot personally verify is correct.  I'm assuming you only 
mean removing the part where socket 0 should have a TSC_ADJUST value of 
0?  And the part mentioned below that all cpu threads on a socket should 
have the same adjust values should not be changed?
> 
>> Our BIOS team did make a change to conform to the "TSC_ADJUST should be the
>> same on all cpu threads on a single socket" requirement, so we were able to
>> pass that part of the TSC validation functions.  (Prior to this, the TSC's
>> were synced by writing directly to the TSC MSR and natural delays in the
>> processor firmware caused the slight differences in the TSC ADJUST values.)
> 
> Right. TSC_ADJUST is there for a reason.

Well, there is the problem with bit 63 (sign bit) in the TSC ADJUST MSR. 
So it's almost there... :)
> 
>>> But the UV 'boot chassis at different times' brings me to a related
>>> question:
>>
>> Essentially what happens is the system reset signals are distributed in
>> various ways which cause the different chassis to start up asynchronously with
>> each other.  The UV system is not "hard" bound to each other but adapts to the
>> system configuration as it starts up.
> 
> I figured that much.
> 
>>> How is this setup dealing with ART (Always Running Timer, which is
>>> distributed over PCIe for hardware timestamping and hardware assisted event
>>> correlation)?
>>>
>>> I assume that ART on UV is also per chassis, but that means that the
>>> documented relation ship of:
>>>
>>> 	TSC = ART * n/d + offset
>>>
>>> where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
>>> not applicable.
>>>
>>> Is there some other magic in play which makes ART work across chassis?
>>>
>>> Thanks,
>>>
>>> 	tglx
>>>
>>
>> Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'.  It does
>> have an external clock generator that is distributed as part of the NumaLink
>> protocol and signal set.  Since separate chassis can be configured to be
>> either within the same SSI or in separate SSI's then it has the ability to
>> configure which chassis are in sync with each other and which are on a
>> different clock sync.  This is all within the purview of the BIOS folks.
> 
> Cute. How is that supposed to work, when the chassis are out of sync?

Huh?  The TSC's are only guaranteed to be synced within an SSI.  As long 
as the sockets in the same SSI share a common clock, the BIOS can 
synchronize the TSC's to be very, very close.

>> We do have independent methods to verify if TSCs' are in sync with each other
>> by measuring the skew rate.  Typical deviations on UV are within a two digit
>> clock tick spread, which at an Uncore frequency of 2.5Ghz is in the small
>> single digit or less nanosecond range.
> 
> That should be good enough to pass the kernel side tests.
> 
> But back to my question about ART. You might talk to your BIOS/HW folks
> about that and eventually disable the ART related functionality in the
> kernel on UV.

Once again, I'm not sure what code you are talking about.  I talked to 
the BIOS engineer specifically working on TSC related items and he was 
also not familiar with the Intel ART spec.  I could go farther up the 
chain to the hardware designers but right now this would be extremely 
low on their priority queue.

Perhaps if you were more specific in what code should be disabled?  We 
would like to keep as much as the self check code as possible as there 
are still some remotely possible failure cases.  And removing the self 
checks would have a negative impact on reliability.

I actually did test using the CPU flags, "TSC reliable" and no "TSC 
ADJUST" to prevent the kernel from overwriting the already correct TSC 
ADJUST values.  But that also turned off the clock source watchdog and 
having the check_tsc_sync_target() test run was a good verification 
step.  Our goal was/is to conform as much as possible to the checks and 
the only unfixable code was the assumption that socket 0 should have a 
TSC_ADJUST value of 0.

Thanks,
Mike