lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3daa086c-b4a0-47a9-8bfc-aac4139013c4@paulmck-laptop>
Date:   Fri, 31 Mar 2023 10:16:59 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: A couple of TSC questions

On Tue, Mar 28, 2023 at 02:58:54PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 27, 2023 at 10:19:54AM +0800, Feng Tang wrote:
> > On Fri, Mar 24, 2023 at 05:47:33PM -0700, Paul E. McKenney wrote:
> > > On Wed, Mar 22, 2023 at 01:14:48PM +0800, Feng Tang wrote:

[ . . . ]

> > > > > Second, we are very occasionally running into console messages like this:
> > > > > 
> > > > > Measured 2 cycles TSC warp between CPUs, turning off TSC clock.
> > > > > 
> > > > > This comes from check_tsc_sync_source() and indicates that one CPU's
> > > > > TSC read produced a later time than a later read from some other CPU.
> > > > > I am beginning to suspect that these can be caused by unscheduled delays
> > > > > in the TSC synchronization code, but figured I should ask you if you have
> > > > > ever seen these.  And of course, if so, what the usual causes might be.
> > > > 
> > > > I haven't seen this error myself or got similar reports. Usually it
> > > > should be easy to detect once happened, as falling back to HPET
> > > > will trigger obvious performance degradation.
> > > 
> > > And that is exactly what happened.  ;-)
> > > 
> > > > Could you give more detail about when and how it happens, and the
> > > > HW info like how many sockets the platform has. 
> > > 
> > > We are in early days, so I am checking for other experiences.
> > > 
> > > > CC Thomas, Waiman, as they discussed simliar case here:
> > > > https://lore.kernel.org/lkml/87h76ew3sb.ffs@tglx/T/#md4d0a88fb708391654e78312ffa75b481690699f
> > > 
> > > Fun!  ;-)
> 
> Waiman, do you recall what fraction of the benefit was provided by the
> first patch, that is, the one that grouped the sync_lock, last_tsc,
> max_warp, nr_warps, and random_warps global variables into a single
> struct?

And what we are seeing is unlikely to be due to cache-latency-induced
delays.  We see a very precise warp, for example, one system always
has 182 cycles of TSC warp, another 273 cycles, and a third 469 cycles.
Another is at the insanely large value of about 2^64/10, and shows some
variation, but that variation is only about 0.1%.

But any given system only sees warp on about half of its reboots.
Perhaps due to the automation sometimes power cycling?

There are few enough affected systems that investigation will take
some time.

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ