[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87imqqrj97.fsf@vitty.brq.redhat.com>
Date: Wed, 21 Aug 2019 10:54:28 +0200
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Michael Kelley <mikelley@...rosoft.com>,
Tianyu Lan <lantianyu1986@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Tianyu Lan <Tianyu.Lan@...rosoft.com>,
"linux-arch\@vger.kernel.org" <linux-arch@...r.kernel.org>,
"linux-hyperv\@vger.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel\@vger kernel org" <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Sasha Levin <sashal@...nel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Arnd Bergmann <arnd@...db.de>
Subject: RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
Vitaly Kuznetsov <vkuznets@...hat.com> writes:
> Michael Kelley <mikelley@...rosoft.com> writes:
>
>> I talked to KY Srinivasan for any history about TSC page on 32-bit. He said
>> there was no technical reason not to implement it, but our focus was always
>> 64-bit Linux, so the 32-bit was much less important. Also, on 32-bit Linux,
>> the required 64x64 multiply and shift is more complex and takes more
>> more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
>> the 64-bit implementation), so the win over a MSR read is less. I
>> don't know of any actual measurements being made to compare vs.
>> MSR read.
>
> VMExit is 1000 CPU cycles or so, I would guess that TSC page
> calculations are better. Let me try to build 32bit kernel and do some
> quick measurements.
So I tried and the difference is HUGE.
For in-kernel clocksource reads (like sched_clock()), the testing code
was:
before = rdtsc_ordered();
for (i = 0; i < 1000; i++)
(void)read_hv_sched_clock_msr();
after = rdtsc_ordered();
printk("MSR based clocksource: %d cycles\n", ((u32)(after - before))/1000);
before = rdtsc_ordered();
for (i = 0; i < 1000; i++)
(void)read_hv_sched_clock_tsc();
after = rdtsc_ordered();
printk("TSC page clocksource: %d cycles\n", ((u32)(after - before))/1000);
The result (WS2016) is:
[ 1.101910] MSR based clocksource: 3361 cycles
[ 1.105224] TSC page clocksource: 49 cycles
For userspace reads the absolute difference is even bigger as TSC page
gives us functional vDSO:
Testing code:
before = rdtsc();
for (i = 0; i < COUNT; i++)
clock_gettime(CLOCK_REALTIME, &tp);
after = rdtsc();
printf("%d\n", (after - before)/COUNT);
Result:
TSC page:
# ./gettime_cycles
131
MSR:
# ./gettime_cycles
5664
With all that I see no reason for us to not enable TSC page on 32bit,
even if the number of users is negligible, this will allow us to get rid
of ugly #ifdef CONFIG_HYPERV_TSCPAGE in the code.
I'll send a patch for discussion.
--
Vitaly
Powered by blists - more mailing lists