linux-kernel - [PATCH v4 0/5] x86: faster smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <1453904765-11073-1-git-send-email-mst@redhat.com>
Date:	Wed, 27 Jan 2016 17:10:12 +0200
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Davidlohr Bueso <dave@...olabs.net>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	the arch/x86 maintainers <x86@...nel.org>,
	Davidlohr Bueso <dbueso@...e.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	virtualization <virtualization@...ts.linux-foundation.org>,
	Borislav Petkov <bp@...en8.de>
Subject: [PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.

So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around this code.

Fortunately no callers of clflush (except one) order it using smp_mb(), so
after fixing that one caller, it seems safe to override smp_mb straight away.

Down the road, it might make sense to introduce clflush_mb() and switch
to that for clflush callers.

While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h

The documentation fixes are included first - I verified that
they do not change the generated code at all. Borislav Petkov
said they will appear in tip eventually, included here for
completeness.

The last patch changes __smp_mb() to lock addl. I was unable to
measure a speed difference on a macro benchmark,
but I noted that even doing
	#define mb() barrier()
seems to make no difference for most benchmarks
(it causes hangs sometimes, of course).

HPA asked that the last patch is deferred until we hear back from
intel, which makes sense of course. So it needs HPA's ack.

Changes from v3:
	Leave mb() alone for now since it's used to order
	clflush, which requires mfence. Optimize smp_mb instead.

Changes from v2:
	add patch adding cc clobber for addl
	tweak commit log for patch 2
	use addl at SP-4 (as opposed to SP) to reduce data dependencies

Michael S. Tsirkin (5):
  x86: add cc clobber for addl
  x86: drop a comment left over from X86_OOSTORE
  x86: tweak the comment about use of wmb for IO
  x86: use mb() around clflush
  x86: drop mfence in favor of lock+addl

 arch/x86/include/asm/barrier.h | 17 ++++++++---------
 arch/x86/kernel/process.c      |  4 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

-- 
MST