lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 11 Nov 2008 21:51:12 +0000
From:	Russell King <rmk+lkml@....linux.org.uk>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	Nicolas Pitre <nico@....org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	torvalds@...ux-foundation.org, dhowells@...hat.com, mingo@...e.hu,
	a.p.zijlstra@...llo.nl, linux-kernel@...r.kernel.org,
	ralf@...ux-mips.org, benh@...nel.crashing.org, paulus@...ba.org,
	davem@...emloft.net, mingo@...hat.com, tglx@...utronix.de,
	rostedt@...dmis.org, linux-arch@...r.kernel.org
Subject: Re: [PATCH] convert cnt32_to_63 to inline

On Tue, Nov 11, 2008 at 03:11:30PM -0500, Mathieu Desnoyers wrote:
> I think the added barrier() are causing these pipeline stalls. They
> don't allow the compiler to read variables such as oscr2ns_scale before
> the barrier because gcc cannot assume it won't be modified. However, to
> insure that OSCR read is done after __m_cnt_hi read, this barrier seems
> required to be safe against gcc optimizations.
> 
> Have you compared my patch to Nicolas'patch, which adds a smp_rmb() in
> the macro or to a vanilla tree ?

Nicolas' patch compared to unmodified - there's less side effects,
which come down to two pipeline stalls whereas we had none with
the unmodified code.

One pipeline stall for loading the address of __m_cnt_hi and reading
its value, followed by the same thing for oscr2ns_scale.

I think this is showing the problem of compiler barriers - they are
indescriminate.  They are total and complete barriers - not only do
they act on the data but also the compilers ability to emit code for
generating the addresses of the data to be loaded.

Clearly, the address of OSCR, __m_cnt_hi nor oscr2ns_scale is ever
going to change at run time - their addresses are all stored in the
literal pool, but by putting compiler barriers in, the compiler is
being prevented from reading from the literal pool at the most
appropriate point.

So, I've tried this:

 unsigned long long sched_clock(void)
 {
+       unsigned long *oscr2ns_ptr = &oscr2ns_scale;
        unsigned long long v = cnt32_to_63(OSCR);
-       return (v * oscr2ns_scale) >> OSCR2NS_SCALE_FACTOR;
+       return (v * *oscr2ns_ptr) >> OSCR2NS_SCALE_FACTOR;
 }

to try to explicitly code the loads.  This unfortunately results in
three pipeline stalls.  Also tried swapping the two lines starting
'unsigned long' without any improvement on not having those extra hacks
to work around the barrier.

So, let's summarise this:

1. the existing code works, is correct on ARM, and is efficient.
2. throwing barriers into the function makes it less efficient.
3. re-engineering the code appears to make things worse.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ