lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 06 May 2015 13:55:30 +0200
From:	Juergen Gross <jgross@...e.com>
To:	Jeremy Fitzhardinge <jeremy@...p.org>,
	linux-kernel@...r.kernel.org, x86@...nel.org, hpa@...or.com,
	tglx@...utronix.de, mingo@...hat.com,
	xen-devel@...ts.xensource.com, konrad.wilk@...cle.com,
	david.vrabel@...rix.com, boris.ostrovsky@...cle.com,
	chrisw@...s-sol.org, akataria@...are.com, rusty@...tcorp.com.au,
	virtualization@...ts.linux-foundation.org, gleb@...nel.org,
	pbonzini@...hat.com, kvm@...r.kernel.org
Subject: Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

On 05/05/2015 07:21 PM, Jeremy Fitzhardinge wrote:
> On 05/03/2015 10:55 PM, Juergen Gross wrote:
>> I did a small measurement of the pure locking functions on bare metal
>> without and with my patches.
>>
>> spin_lock() for the first time (lock and code not in cache) dropped from
>> about 600 to 500 cycles.
>>
>> spin_unlock() for first time dropped from 145 to 87 cycles.
>>
>> spin_lock() in a loop dropped from 48 to 45 cycles.
>>
>> spin_unlock() in the same loop dropped from 24 to 22 cycles.
>
> Did you isolate icache hot/cold from dcache hot/cold? It seems to me the
> main difference will be whether the branch predictor is warmed up rather
> than if the lock itself is in dcache, but its much more likely that the
> lock code is icache if the code is lock intensive, making the cold case
> moot. But that's pure speculation.
>
> Could you see any differences in workloads beyond microbenchmarks?
>
> Not that its my call at all, but I think we'd need to see some concrete
> improvements in real workloads before adding the complexity of more pvops.

I did another test on a larger machine:

25 kernel builds (time make -j 32) on a 32 core machine. Before each
build "make clean" was called, the first result after boot was omitted
to avoid disk cache warmup effects.

System time without my patches: 861.5664 +/- 3.3665 s
                with my patches: 852.2269 +/- 3.6629 s


Juergen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ