[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ea6e82c-4761-e0c9-3e75-8ec39eecb30a@zytor.com>
Date: Mon, 5 Jun 2023 09:32:24 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: David Laight <David.Laight@...LAB.COM>,
"'Thomas Gleixner'" <tglx@...utronix.de>,
Muhammad Usama Anjum <usama.anjum@...labora.com>,
Jonathan Corbet <corbet@....net>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>
Cc: Steven Noonan <steven@...inklabs.net>,
"kernel@...labora.com" <kernel@...labora.com>
Subject: Re: Direct rdtsc call side-effect
On 6/5/23 08:54, David Laight wrote:
> From: Thomas Gleixner <tglx@...utronix.de>
>> Sent: 05 June 2023 15:44
>>
>> On Mon, Jun 05 2023 at 10:27, David Laight wrote:
>>> It has to be said that using it as a time source was fundamentally
>>> a bad idea.
>>
>> Too bad you weren't around many moons ago and educated us on that. That
>> would have saved us lots of trouble and work.
>
> Indeed :-)
> I do remember thinking the TSC was really a good time source when
> I first saw it being done about 30 years ago.
>
The TSC is certainly not perfect; partly because, ironically enough, it
was introduced just *before* out of order and power management entered
the x86 world.
It is no secret that it has been slow to catch up. It was easy to put a
counter in; it is a *lot* harder to make it work in all the possible
scenarios in the power-managed, out-of-order world.
It is one of my personal pet projects in the architecture work to push
to get that last distance; we are not yet there.
>
> I'm thinking of benchmarking the IP checksum code where you are
> trying to find out how many bytes/clock the loop is doing.
> On recent x86-64 the theoretical limit (without fighting AVX) 1s 16
> bytes/clock, I've measured 12, 8 is relatively easy.
> (The current asm code runs at 4 on older cpu, doesn't get
> much above 6 at all.)
>
> What happens is that the cpu frequency speeds up as soon as the
> test starts but the TSC frequency stays constants.
> So you can only use the TSC to measure time, not execution speed.
>
> Run enough copies of 'while :; do :; done &' to make all but one
> cpu busy and the cpus all speed up giving completely different
> TSC counts for short loops.
>
That is the reason for architecturally fixed performance counters.
-hpa
Powered by blists - more mailing lists