lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2323e1a9edc8304ed4fc9c3039dfbaf25dbb2f6b.1481658746.git.jslaby@suse.cz>
Date:   Tue, 13 Dec 2016 20:52:41 +0100
From:   Jiri Slaby <jslaby@...e.cz>
To:     stable@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Jiri Slaby <jslaby@...e.cz>
Subject: [PATCH 3.12 15/38] tile: avoid using clocksource_cyc2ns with absolute cycle count

From: Chris Metcalf <cmetcalf@...lanox.com>

3.12-stable review patch.  If anyone has any objections, please let me know.

===============

commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream.

For large values of "mult" and long uptimes, the intermediate
result of "cycles * mult" can overflow 64 bits.  For example,
the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
we have mult = 853, and after 208.5 days, we overflow 64 bits.

Since clocksource_cyc2ns() is intended to be used for relative
cycle counts, not absolute cycle counts, performance is more
importance than accepting a wider range of cycle values.  So,
just use mult_frac() directly in tile's sched_clock().

Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow
in sched_clock") by Salman Qazi results in essentially the same
generated code for x86 as this change does for tile.  In fact,
a follow-on change by Salman introduced mult_frac() and switched
to using it, so the C code was largely identical at that point too.

Peter Zijlstra then added mul_u64_u32_shr() and switched x86
to use it.  This is, in principle, better; by optimizing the
64x64->64 multiplies to be 32x32->64 multiplies we can potentially
save some time.  However, the compiler piplines the 64x64->64
multiplies pretty well, and the conditional branch in the generic
mul_u64_u32_shr() causes some bubbles in execution, with the
result that it's pretty much a wash.  If tilegx provided its own
implementation of mul_u64_u32_shr() without the conditional branch,
we could potentially save 3 cycles, but that seems like small gain
for a fair amount of additional build scaffolding; no other platform
currently provides a mul_u64_u32_shr() override, and tile doesn't
currently have an <asm/div64.h> header to put the override in.

Additionally, gcc currently has an optimization bug that prevents
it from recognizing the opportunity to use a 32x32->64 multiply,
and so the result would be no better than the existing mult_frac()
until such time as the compiler is fixed.

For now, just using mult_frac() seems like the right answer.

Signed-off-by: Chris Metcalf <cmetcalf@...lanox.com>
Signed-off-by: Jiri Slaby <jslaby@...e.cz>
---
 arch/tile/kernel/time.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
index 5d10642db63e..1edda6603cd9 100644
--- a/arch/tile/kernel/time.c
+++ b/arch/tile/kernel/time.c
@@ -216,8 +216,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num)
  */
 unsigned long long sched_clock(void)
 {
-	return clocksource_cyc2ns(get_cycles(),
-				  sched_clock_mult, SCHED_CLOCK_SHIFT);
+	return mult_frac(get_cycles(),
+			 sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT);
 }
 
 int setup_profiling_timer(unsigned int multiplier)
-- 
2.11.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ