linux-kernel - Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50D521A4.7050509@redhat.com>
Date:	Fri, 21 Dec 2012 21:57:40 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	linux-kernel@...r.kernel.org, aquini@...hat.com, walken@...gle.com,
	lwoodman@...hat.com, jeremy@...p.org,
	Jan Beulich <JBeulich@...ell.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay
 factor

On 12/21/2012 07:48 PM, Eric Dumazet wrote:
> On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
>> Argh, the first one had a typo in it that did not influence
>> performance with fewer threads running, but that made things
>> worse with more than a dozen threads...
>>
>> Please let me know if you can break these patches.
>> ---8<---
>> Subject: x86,smp: auto tune spinlock backoff delay factor
>
>> +#define MIN_SPINLOCK_DELAY 1
>> +#define MAX_SPINLOCK_DELAY 1000
>> +DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };
>
> Using a single spinlock_delay per cpu assumes there is a single
> contended spinlock on the machine, or that contended
> spinlocks protect the same critical section.

The goal is to reduce bus traffic, and keep total
system performance from falling through the floor.

If we have one lock that takes N cycles to acquire,
and a second contended lock that takes N*2 cycles
to acquire, checking the first lock fewer times
before acquisition, and the second lock more times,
should still result in similar average system
throughput.

I suspect this approach should work well if we have
multiple contended locks in the system.

> Given that we probably know where the contended spinlocks are, couldnt
> we use a real scalable implementation for them ?

The scalable locks tend to have a slightly more
complex locking API, resulting in a slightly
higher overhead in the non-contended (normal)
case.  That means we cannot use them everywhere.

Also, scalable locks merely make sure that N+1
CPUs perform the same as N CPUs when there is
lock contention.  They do not cause the system
to actually scale.

For actual scalability, the data structure would
need to be changed, so locking requirements are
better.

> A known contended one is the Qdisc lock in network layer. We added a
> second lock (busylock) to lower a bit the pressure on a separate cache
> line, but a scalable lock would be much better...

My locking patches are meant for dealing with the
offenders we do not know about, to make sure that
system performance does not fall off a cliff when
we run into a surprise.

Known scalability bugs we can fix.

Unknown ones should not cause somebody's system
to fail.

> I guess there are patent issues...

At least one of the scalable lock implementations has been
known since 1991, so there should not be any patent issues
with that one.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/