linux-kernel - Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50DB5531.90500@redhat.com>
Date:	Wed, 26 Dec 2012 14:51:13 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
	aquini@...hat.com, walken@...gle.com, lwoodman@...hat.com,
	jeremy@...p.org, Jan Beulich <JBeulich@...ell.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tom Herbert <therbert@...gle.com>
Subject: Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay
 factor

On 12/26/2012 02:10 PM, Eric Dumazet wrote:

> I did some tests with your patches with following configuration :
>
> tc qdisc add dev eth0 root htb r2q 1000 default 3
> (to force a contention on qdisc lock, even with a multi queue net
> device)
>
> and 24 concurrent "netperf -t UDP_STREAM -H other_machine -- -m 128"
>
> Machine : 2 Intel(R) Xeon(R) CPU X5660  @ 2.80GHz
> (24 threads), and a fast NIC (10Gbps)
>
> Resulting in a 13 % regression (676 Mbits -> 595 Mbits)
>
> In this workload we have at least two contended spinlocks, with
> different delays. (spinlocks are not held for the same duration)
>
> It clearly defeats your assumption of a single per cpu delay being OK :
> Some cpus are spinning too long while the lock was released.

Thank you for breaking my patches.

I had been thinking about ways to deal with multiple
spinlocks, and hoping there would not be a serious
issue with systems contending on multiple locks.

> We might try to use a hash on lock address, and an array of 16 different
> delays so that different spinlocks have a chance of not sharing the same
> delay.
>
> With following patch, I get 982 Mbits/s with same bench, so an increase
> of 45 % instead of a 13 % regression.

Thank you even more for fixing my patches :)

That is a huge win!

Could I have your Signed-off-by: line, so I can merge
your hashed spinlock slots in?

I will probably keep it as a separate patch 4/4, with
your report and performance numbers in it, to preserve
the reason why we keep multiple hashed values, etc...

There is enough stuff in this code that will be
indishinguishable from magic if we do not document it
properly...

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/