lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m2mxk35ji7.fsf@firstfloor.org>
Date:	Wed, 06 Apr 2011 11:20:48 -0700
From:	Andi Kleen <andi@...stfloor.org>
To:	Andy Lutomirski <luto@....EDU>
Cc:	x86@...nel.org, linux-kernel@...r.kernel.org,
	John Stultz <johnstul@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/6] x86-64: Micro-optimize vclock_gettime

Andy Lutomirski <luto@....EDU> writes:

> This series speeds up vclock_gettime(CLOCK_MONOTONIC) on by almost 30%
> (tested on Sandy Bridge).  They're ordered in roughly decreasing order
> of improvement.
>
> These are meant for 2.6.40, but if anyone wants to take some of them
> for 2.6.39 I won't object.

I read all the patchkit and it looks good to me.  I felt a bit uneasy
about the barrier changes though, it may be worth running of the
paranoid "check monotonicity on lots of cpus" test cases to double check
on different CPUs.  The interesting cases are: P4-Prescott, Merom
(C2Duo), AMD K8.

Thanks for doing these optimizations again. Before generic clock source
these functions used to be somewhat faster, but they regressed
significantly back then. It may be worth comparing the current
asm code against these old code and see if there's still something
obvious missing.

Possible more optimizations if you're still motivated:
 
- Move all the timer state/seqlock into one cache line and start 
with a prefetch. 
I did a similar attempt recently for the in kernel timers.
You won't see any difference in a micro benchmark loop, but you may
in a workload that dirties lots of cache between timer calls.

- Replace the indirect call in vread() with a if ( timer == TSC)
inline() else indirect_call
(manual devirtualization essentially)

- Replace the sysctl checks with code patching use the new
static branch frameworks

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ