lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyuZyNREYtKY7OacKar7KVD0pfPYyCo67kyGJkk063E7g@mail.gmail.com>
Date:	Mon, 3 Oct 2011 14:45:23 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Matt Fleming <matt@...sole-pimps.org>
Cc:	Oleg Nesterov <oleg@...hat.com>, Andi Kleen <andi@...stfloor.org>,
	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	Tony Luck <tony.luck@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	David Mosberger-Tang <davidm@...uge.net>
Subject: Re: [RFC][PATCH 0/5] Signal scalability series

On Mon, Oct 3, 2011 at 1:58 PM, Matt Fleming <matt@...sole-pimps.org> wrote:
>
> No, I don't think there was anything wrong with your testing method. I
> ran your command-line under Qemu and saw similar results - with the
> patches applied the single-threaded case slows down (not by 50%, it
> looks more like 25%, but that's still unacceptable and not at all what I
> had anticipated).

Splitting up locks fairly easily causes these kinds of problems.

On many modern microarchitectures, the serialization implied by
locking can be a *big* performance hit. If a system call goes from a
single big lock to two split locks, that can easily make that system
call very noticeably slower. The individual locks may protect a much
smaller section and be "more scalable", but the end result is actually
clearly worse performance.

We've had that several times when we've made smaller locks (in the VM
in particular). One big lock that you take once can be way better than
two small ones that you have to take in sequence (or, worse still,
nested - that's when you can *really* get into exponential badness).

And with even a very limited number of threads (or processes passing
signals back-and-forth) you can get a "train effect": two cores
accessing the same two locks in order, so that they get synchronized.
The "get synchronized" event itself might even be rare, but once it
happens, things can stay synchronized.

And if the second one always then ends up blocking and/or just causing
cacheline ping-pongs, that slowdown can go up by an absolutely huge
amount because you basically make the "rare" case be the common one.

                           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ