lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_4DC4468312A1CB2CA34B0215FAD797D11F07@qq.com>
Date:   Tue, 16 May 2023 14:52:03 +0800
From:   Rong Tao <rtoax@...mail.com>
To:     tglx@...utronix.de
Cc:     rtoax@...mail.com, Rong Tao <rongtao@...tc.cn>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        x86@...nel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)),
        "H. Peter Anvin" <hpa@...or.com>,
        linux-kernel@...r.kernel.org (open list:X86 ARCHITECTURE (32-BIT AND
        64-BIT))
Subject: [PATCH] x86/vdso: Use non-serializing instruction rdtsc

From: Rong Tao <rongtao@...tc.cn>

Replacing rdtscp or 'lfence;rdtsc' with the non-serializable instruction
rdtsc can achieve a 40% performance improvement with only a small loss of
precision.

The RDTSCP instruction is not a serializing instruction, but it does wait
until all previous instructions have executed and all previous loads are
globally visible. The RDTSC instruction is not a serializing instruction.
It does not necessarily wait until all previous instructions have been
executed before reading the counter.

Record the time-consuming of vdso clock_gettime(), pseudo code:

    count = 1000 * 1000 * 100;
    while (count--)
        clock_gettime(CLOCK_REALTIME, &ts);

Time-consuming comparison:

     Time Consume(ns) | rdtsc_ordered() |  rdtsc()  | Promote
    ------------------+-----------------+-----------+---------
    Physical Machine  |  1269147289     | 759067324 |   40%
     Guest OS (KVM)   |  1756615963     | 995823886 |   43%

Signed-off-by: Rong Tao <rongtao@...tc.cn>
---
 arch/x86/include/asm/vdso/gettimeofday.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vdso/gettimeofday.h b/arch/x86/include/asm/vdso/gettimeofday.h
index 4cf6794f9d68..342d29106208 100644
--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -228,7 +228,7 @@ static u64 vread_pvclock(void)
 		if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)))
 			return U64_MAX;
 
-		ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
+		ret = __pvclock_read_cycles(pvti, rdtsc());
 	} while (pvclock_read_retry(pvti, version));
 
 	return ret;
@@ -246,7 +246,7 @@ static inline u64 __arch_get_hw_counter(s32 clock_mode,
 					const struct vdso_data *vd)
 {
 	if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-		return (u64)rdtsc_ordered();
+		return (u64)rdtsc();
 	/*
 	 * For any memory-mapped vclock type, we need to make sure that gcc
 	 * doesn't cleverly hoist a load before the mode check.  Otherwise we
-- 
2.39.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ