[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_4DC4468312A1CB2CA34B0215FAD797D11F07@qq.com>
Date: Tue, 16 May 2023 14:52:03 +0800
From: Rong Tao <rtoax@...mail.com>
To: tglx@...utronix.de
Cc: rtoax@...mail.com, Rong Tao <rongtao@...tc.cn>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)),
"H. Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND
64-BIT))
Subject: [PATCH] x86/vdso: Use non-serializing instruction rdtsc
From: Rong Tao <rongtao@...tc.cn>
Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction
rdtsc can achieve a 40% performance improvement with only a small loss of
precision.
The RDTSCP instruction is not a serializing instruction, but it does wait
until all previous instructions have executed and all previous loads are
globally visible. The RDTSC instruction is not a serializing instruction.
It does not necessarily wait until all previous instructions have been
executed before reading the counter.
Record the time-consuming of vdso clock_gettime(), pseudo code:
count = 1000 * 1000 * 100;
while (count--)
clock_gettime(CLOCK_REALTIME, &ts);
Time-consuming comparison:
Time Consume(ns) | rdtsc_ordered() | rdtsc() | Promote
------------------+-----------------+-----------+---------
Physical Machine | 1269147289 | 759067324 | 40%
Guest OS (KVM) | 1756615963 | 995823886 | 43%
Signed-off-by: Rong Tao <rongtao@...tc.cn>
---
arch/x86/include/asm/vdso/gettimeofday.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 4cf6794f9d68..342d29106208 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -228,7 +228,7 @@ static u64 vread_pvclock(void)
if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)))
return U64_MAX;
- ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
+ ret = __pvclock_read_cycles(pvti, rdtsc());
} while (pvclock_read_retry(pvti, version));
return ret;
@@ -246,7 +246,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
const struct vdso_data *vd)
{
if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
- return (u64)rdtsc_ordered();
+ return (u64)rdtsc();
/*
* For any memory-mapped vclock type, we need to make sure that gcc
* doesn't cleverly hoist a load before the mode check. Otherwise we
--
2.39.1
Powered by blists - more mailing lists