linux-kernel - Re: Skylake (XPS 13 9350) TSC is way off

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUL=DdwhoPK2TaPwjMrpfVm8cn62ASfNjZfO3NcwQ7H8g@mail.gmail.com>
Date:	Wed, 2 Dec 2015 15:42:00 -0800
From:	Andy Lutomirski <luto@...nel.org>
To:	John Stultz <john.stultz@...aro.org>
Cc:	Andy Lutomirski <luto@...nel.org>,
	"Brown, Len" <len.brown@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>,
	"Hunter, Adrian" <adrian.hunter@...el.com>
Subject: Re: Skylake (XPS 13 9350) TSC is way off

On Wed, Dec 2, 2015 at 3:38 PM, John Stultz <john.stultz@...aro.org> wrote:
> On Wed, Dec 2, 2015 at 3:25 PM, Andy Lutomirski <luto@...nel.org> wrote:
>> In case it's at all useful, adjtimex -p says:
>>
>>          mode: 0
>>        offset: 0
>>     frequency: 135641
>>      maxerror: 37498
>>      esterror: 1532
>>        status: 8192
>> time_constant: 2
>>     precision: 1
>>     tolerance: 32768000
>>          tick: 10000
>>      raw time:  1449098317s 671243180us = 1449098317.671243180
>>
>> this suggests a rather small correction, so I really have no idea what
>> "Adjusting tsc more than 11% (8039115 vs 7759462)" means.
>>
>> John, you wrote this code.  What does the error message mean?
>
> Basally the internal correction adjustments are getting pulled further
> then it is supposed to (its concerning since in some cases we push the
> clocksource mult value to be quite large, and so making a large
> adjustment could possibly cause an overflow).
>
> Awhile back I had intended to cap the max adjustment, but out of
> caution I put in a warning instead to see how often this might occur.
>
> I've seen it reported sometimes while folks were running trinity or
> under a VM (suggesting that due to system delays timekeeping
> management may have been delayed and the internal time error had grown
> quite far, so the internal correction was being somewhat aggressive).
> Though more recently (3.17 era) we've changed the internal adjustment
> code to try to be more conservative to avoid over-steering w/ NOHZ, so
> I'd expect fewer of these.
>

The trouble for me is that it's not clear from the message what rate
doesn't agree with what rate (kernel's unadjusted rate vs adjtimex's
request?), and the units are incomprehensible.  If the issue is that
adjtimex(2) has asked for X PPM of adjustment and X is greater than Y,
could we display that directly?

> On a hunch, are you running chrony instead of ntpd?

Yes, this is indeed chrony.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/