lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxGWKJwo1CfS3EJFagUb+JRtc+URJOu2yz2RztWWrkraA@mail.gmail.com>
Date:	Tue, 3 Sep 2013 11:34:10 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Al Viro <viro@...iv.linux.org.uk>,
	Sedat Dilek <sedat.dilek@...il.com>,
	Waiman Long <waiman.long@...com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Jeff Layton <jlayton@...hat.com>,
	Miklos Szeredi <mszeredi@...e.cz>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andi Kleen <andi@...stfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...radead.org>
Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless
 update of refcount

On Tue, Sep 3, 2013 at 8:41 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I've done that, and it matches the PEBS runs, except obviously with
> the instruction skew (so then depending on run it's 95% the
> instruction after the xadd). So the PEBS profiles are entirely
> consistent with other data.

So one thing that strikes me about our lg-locks is that they are
designed to be cheap, but they force this insane 3-deep memory access
chain to lock them.

That may be a large part of why lg_local_lock shows up so clearly on
my profiles: the single "lock xadd" instruction ends up not just being
serializing, but it is what actually consumes the previous memory
reads.

The core of the lg_local_lock sequence ends up being this
four-instruction sequence:

    mov    (%rdi),%rdx
    add    %gs:0xcd48,%rdx
    mov    $0x100,%eax
    lock   xadd   %ax,(%rdx)

and that's a nasty chain of dependent memory loads. First we load the
percpu address, then we add the percpu offset to that, and then we do
the xadd on the result.

It's kind of sad, because in *theory* we could get rid of that whole
thing entirely, and just do it as one single

    mov    $0x100,%eax
    lock xadd %ax,%gs:vfsmount_lock

that only has one single memory access, not three dependent ones.

But the two extra memory accesses come from:

 - the lglock data structure isn't a percpu data structure, it's this
stupid global data structure that has a percpu pointer in it.  So that
first "mov (%rdi),%rdx" is purely to load what is effectively a
constant address (per lglock).

   And that's not because it wants to be, but because we associate
global lockdep data with it. Ugh. If it wasn't for that, we could just
make them percpu.

 - we don't have a percpu spinlock accessor, so we always need to turn
the percpu address into a global address by adding the percpu base
(and that's the "add %gsL...,%rdx" part).

Oh well. This whole "lg_local_lock" is really noticeable on my
test-case mainly because my test-case only stat's a pathname with a
single path component, so the whole lookup really is dominated by all
the "setup/teardown" code. Real loads tend to look up much longer
pathnames, so the setup/teardown isn't so dominant, and actually
looking up the dentries from the hash chain is where most of the time
goes. But it's annoying to have that one big spike in the profile and
not being able to do anything about it.

           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ