lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151019112417.GA752@gmail.com>
Date:	Mon, 19 Oct 2015 13:24:17 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	ling.ma.program@...il.com, mingo@...hat.com,
	linux-kernel@...r.kernel.org, Ma Ling <ling.ml@...baba-inc.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [RFC PATCH] qspinlock: Improve performance by reducing load
 instruction rollback


* Peter Zijlstra <peterz@...radead.org> wrote:

> On Mon, Oct 19, 2015 at 09:58:23AM +0200, Ingo Molnar wrote:
> > 
> > * ling.ma.program@...il.com <ling.ma.program@...il.com> wrote:
> > 
> > > From: Ma Ling <ling.ml@...baba-inc.com>
> > > 
> > > All load instructions can run speculatively but they have to follow
> > > memory order rule in multiple cores as below:
> > > _x = _y = 0
> > > 
> > > Processor 0				Processor 1
> > > 
> > > mov r1, [ _y]  //M1			mov [ _x], 1  //M3
> > > mov r2, [ _x]  //M2			mov [ _y], 1  //M4
> > > 
> > > If r1 = 1, r2 must be 1
> > > 
> > > In order to guarantee above rule, although Processor 0 execute
> > > M1 and M2 instruction out of order, they are kept in ROB,
> > > when load buffer for _x in Processor 0 received the update 
> > > message from Processor 1, Processor 0 need to roll back
> > > from M2 instruction, which will flush the whole pipeline,
> > > the latency is over the penalty from branch prediction miss.
> > > 
> > > In this patch we use lock cmpxchg instruction to force load
> > > instructions to be serialization, the destination operand
> > > receives a write cycle without regard to the result of
> > > the comparison, which can help us to reduce the penalty
> > > from load instruction roll back.
> > > 
> > > Our experiment indicates the performance can be improved by 10%~15%
> > > for 2 and 3 threads cases, the conflicts from lock cache line
> > > spend them most of the time.
> > 
> > So it would be nice to create a new user-space spinlock testing facility, via a 
> > new 'perf bench spinlock' feature or so. That way others can test and validate 
> > your results on different hardware as well.
> 
> So its trivial to lift this code into userspace -- in fact, I have that
> somewhere.
> 
> The trouble is going to keep them in sync.

So we can just try this optimistically, and if it keeps breaking, we can use the 
technique perf uses to sync up the rbtree implementation: we copy the kernel 
version into tooling, but run diff against the kernel version and warn at tool 
build time that there's divergence.

I.e. a non-build-fatal force that keeps things in sync.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ