lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Fri, 20 May 2011 11:22:54 +0300
From:	Janne Blomqvist <blomqvist.janne@...il.com>
To:	luto@....edu, hpa@...or.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/6] x86-64: Remove unnecessary barrier in vread_tsc

Hi,

for some further micro-optimization of clock_gettime(), one could
consider using "rdtscp" instead of "mfence;rdtsc" on AMD platforms.
With the patch at the bottom of the mail to linux-clock-tests.git
(Andy: feel free to commit it to the upstream repo) I get on an
Opteron 2435 (6-core, 2.6 GHz):

$ ./timing_test 100 mfence_rdtsc CLOCK_MONOTONIC
100000000 loops in 3.96236s = 39.62 nsec / loop

$ ./timing_test 100 rdtscp CLOCK_MONOTONIC
100000000 loops in 2.87859s = 28.79 nsec / loop

(best of 5 runs for each test)

For comparison, on a Xeon X3450 (Nehalem 2.67 GHz) I get:

$ ./timing_test 100 lfence_rdtsc CLOCK_MONOTONIC
100000000 loops in 0.95337s = 9.53 nsec / loop

$ ./timing_test 100 rdtscp CLOCK_MONOTONIC
100000000 loops in 1.02148s = 10.21 nsec / loop

(same as above, best of 5 runs for each test)

So on Intel "lfence;rdtsc" seems slightly faster than "rdtscp".


diff --git a/timing_test.cc b/timing_test.cc
index 136a731..3a40d41 100644
--- a/timing_test.cc
+++ b/timing_test.cc
@@ -33,8 +33,12 @@ int main(int argc, char **argv)
                printf("\nClocks are:\n");
                describe_clock("CLOCK_REALTIME", CLOCK_REALTIME);
                describe_clock("CLOCK_MONOTONIC", CLOCK_MONOTONIC);
+#ifdef CLOCK_REALTIME_COARSE
                describe_clock("CLOCK_REALTIME_COARSE", CLOCK_REALTIME_COARSE);
+#endif
+#ifdef CLOCK_MONOTONIC_COARSE
                describe_clock("CLOCK_MONOTONIC_COARSE",
CLOCK_MONOTONIC_COARSE);
+#endif
                return 1;
        }

@@ -73,6 +77,11 @@ int main(int argc, char **argv)
                        asm volatile ("");
                        asm volatile ("lfence;rdtsc;lfence" : "=a"
(a), "=d" (d));
                }
+       } else if (!strcmp(mode, "mfence_rdtsc")) {
+               for (size_t i = 0; i < loops; ++i) {
+                       unsigned int a, d;
+                       asm volatile ("mfence;rdtsc" : "=a" (a), "=d" (d));
+               }
        } else if (!strcmp(mode, "mfence_rdtsc_mfence")) {
                for (size_t i = 0; i < loops; ++i) {
                        unsigned int a, d;


-- 
Janne Blomqvist
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ