[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tip-99c19e6a8fe4a95fa0dac191207a1d40461b1604@git.kernel.org>
Date:   Fri, 5 Oct 2018 12:06:52 -0700
From:   tip-bot for Andy Lutomirski <tipbot@...or.com>
To:     linux-tip-commits@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...nel.org,
        luto@...nel.org, hpa@...or.com
Subject: [tip:x86/vdso] x86/vdso: Rearrange do_hres() to improve code
 generation
Commit-ID:  99c19e6a8fe4a95fa0dac191207a1d40461b1604
Gitweb:     https://git.kernel.org/tip/99c19e6a8fe4a95fa0dac191207a1d40461b1604
Author:     Andy Lutomirski <luto@...nel.org>
AuthorDate: Fri, 5 Oct 2018 11:02:43 -0700
Committer:  Thomas Gleixner <tglx@...utronix.de>
CommitDate: Fri, 5 Oct 2018 21:03:23 +0200
x86/vdso: Rearrange do_hres() to improve code generation
vgetcyc() is full of barriers, so fetching values out of the vvar
page before vgetcyc() for use after vgetcyc() results in poor code
generation.  Put vgetcyc() first to avoid this problem.
Also, pull the tv_sec division into the loop and put all the ts
writes together.  The old code wrote ts->tv_sec on each iteration
before the syscall fallback check and then added in the offset
afterwards, which forced the compiler to pointlessly copy base->sec
to ts->tv_sec on each iteration.  The new version seems to generate
sensible code.
Saves several cycles.  With this patch applied, the result is faster
than before the clock_gettime() rewrite.
Signed-off-by: Andy Lutomirski <luto@...nel.org>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Link: https://lkml.kernel.org/r/3c05644d010b72216aa286a6d20b5078d5fae5cd.1538762487.git.luto@kernel.org
---
 arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 18c8a78d1ec9..007b3fe9d727 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -142,23 +142,27 @@ notrace static inline u64 vgetcyc(int mode)
 notrace static int do_hres(clockid_t clk, struct timespec *ts)
 {
 	struct vgtod_ts *base = >od->basetime[clk];
-	u64 cycles, last, ns;
+	u64 cycles, last, sec, ns;
 	unsigned int seq;
 
 	do {
 		seq = gtod_read_begin(gtod);
-		ts->tv_sec = base->sec;
+		cycles = vgetcyc(gtod->vclock_mode);
 		ns = base->nsec;
 		last = gtod->cycle_last;
-		cycles = vgetcyc(gtod->vclock_mode);
 		if (unlikely((s64)cycles < 0))
 			return vdso_fallback_gettime(clk, ts);
 		if (cycles > last)
 			ns += (cycles - last) * gtod->mult;
 		ns >>= gtod->shift;
+		sec = base->sec;
 	} while (unlikely(gtod_read_retry(gtod, seq)));
 
-	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
+	/*
+	 * Do this outside the loop: a race inside the loop could result
+	 * in __iter_div_u64_rem() being extremely slow.
+	 */
+	ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
 	ts->tv_nsec = ns;
 
 	return 0;
Powered by blists - more mailing lists
 
