linux-kernel - Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being non-null is valid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6109e79-b77e-dc20-c216-e61a08d1d6a1@hpe.com>
Date:   Mon, 25 Sep 2017 09:47:07 -0700
From:   Mike Travis <mike.travis@....com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Bin Gao <bin.gao@...ux.intel.com>,
        Prarit Bhargava <prarit@...hat.com>,
        Dimitri Sivanich <dimitri.sivanich@....com>,
        Andrew Banman <andrew.banman@....com>,
        Russ Anderson <russ.anderson@....com>,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 1/3] x86/kernel: Add option that TSC on Socket 0 being
 non-null is valid

On 9/25/2017 8:30 AM, Thomas Gleixner wrote:
> On Thu, 21 Sep 2017, mike.travis@....com wrote:
>> +/*
>> + * TSC on socket 0 being non-zero may be correct as set by BIOS
>> + */
>> +static int __read_mostly tsc_socket0_nonzero;
>> +
>>   /* native_sched_clock() is called before tsc_init(), so
>>      we must start with the TSC soft disabled to prevent
>>      erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
>> @@ -244,6 +249,20 @@ int check_tsc_unstable(void)
>>   }
>>   EXPORT_SYMBOL_GPL(check_tsc_unstable);
>>   
>> +void mark_tsc_socket0_nonzero(char *reason)
>> +{
>> +	tsc_socket0_nonzero = 1;
>> +	pr_info("Marking TSC non-zero value valid for socket 0 due to %s\n",
>> +		reason);
>> +}
>> +EXPORT_SYMBOL_GPL(mark_tsc_socket0_nonzero);
>>
>> +int check_tsc_socket0_nonzero(void)
>> +{
>> +	return tsc_socket0_nonzero;
>> +}
>> +EXPORT_SYMBOL_GPL(check_tsc_socket0_nonzero);
> 
> Is there a real reason to export these functions? I can't see the UV early
> boot code and tsc_sync being built as modules in the forseeable future, but
> perhaps you know more than I do :)

Yes, that was a mistake.  I originally inserted this by following the 
example of the check_tsc_unstable() function.  I had a later patch 
moving this to tsc_sync.c as a local flag but apparently it reverted 
back to the earlier version.  I'll fix that.
> 
> Aside of that I really do not like this kind of special case hackery. The
> real question is whether we need to enforce TSC_ADJUST == 0 on the boot cpu
> at all. In principle we don't anymore now that we handle that TSC deadline
> timer wreckage cleanly.

I am hesitant to make such a global change as it appears the author 
intentionally added this.  It not only caused our internal tsc sync 
tests to become totally out of whack, it also generated an avalanche of 
error messages to the system console (>3000 messages for a 32 socket 
Skylake system).  And I don't have the means to test how major changes 
to the TSC adjust functions will affect standard whitebox PC's.

Our BIOS team did make a change to conform to the "TSC_ADJUST should be 
the same on all cpu threads on a single socket" requirement, so we were 
able to pass that part of the TSC validation functions.  (Prior to this, 
the TSC's were synced by writing directly to the TSC MSR and natural 
delays in the processor firmware caused the slight differences in the 
TSC ADJUST values.)

> 
> But the UV 'boot chassis at different times' brings me to a related
> question:

Essentially what happens is the system reset signals are distributed in 
various ways which cause the different chassis to start up 
asynchronously with each other.  The UV system is not "hard" bound to 
each other but adapts to the system configuration as it starts up.

> 
> How is this setup dealing with ART (Always Running Timer, which is
> distributed over PCIe for hardware timestamping and hardware assisted event
> correlation)?
> 
> I assume that ART on UV is also per chassis, but that means that the
> documented relation ship of:
> 
> 	TSC = ART * n/d + offset
> 
> where $offset is system wide (the TSC_ADJUST value of the boot cpu), is
> not applicable.
> 
> Is there some other magic in play which makes ART work across chassis?
 >
 > Thanks,
 >
 > 	tglx
 >

Sorry, I'm not sure how the UV hardware mimics the concept of 'ART'.  It 
does have an external clock generator that is distributed as part of the 
NumaLink protocol and signal set.  Since separate chassis can be 
configured to be either within the same SSI or in separate SSI's then it 
has the ability to configure which chassis are in sync with each other 
and which are on a different clock sync.  This is all within the purview 
of the BIOS folks.

We do have independent methods to verify if TSCs' are in sync with each 
other by measuring the skew rate.  Typical deviations on UV are within a 
two digit clock tick spread, which at an Uncore frequency of 2.5Ghz is 
in the small single digit or less nanosecond range.

I'll post the updated patch set shortly.

Thanks,
Mike