lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <1356549008.20133.20856.camel@edumazet-glaptop> Date: Wed, 26 Dec 2012 11:10:08 -0800 From: Eric Dumazet <eric.dumazet@...il.com> To: Rik van Riel <riel@...hat.com> Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org, aquini@...hat.com, walken@...gle.com, lwoodman@...hat.com, jeremy@...p.org, Jan Beulich <JBeulich@...ell.com>, Thomas Gleixner <tglx@...utronix.de>, Tom Herbert <therbert@...gle.com> Subject: Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor On Fri, 2012-12-21 at 22:50 -0500, Rik van Riel wrote: > I will try to run this test on a really large SMP system > in the lab during the break. > > Ideally, the auto-tuning will keep the delay value large > enough that performance will stay flat even when there are > 100 CPUs contending over the same lock. > > Maybe it turns out that the maximum allowed delay value > needs to be larger. Only one way to find out... > Hi Rik I did some tests with your patches with following configuration : tc qdisc add dev eth0 root htb r2q 1000 default 3 (to force a contention on qdisc lock, even with a multi queue net device) and 24 concurrent "netperf -t UDP_STREAM -H other_machine -- -m 128" Machine : 2 Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (24 threads), and a fast NIC (10Gbps) Resulting in a 13 % regression (676 Mbits -> 595 Mbits) In this workload we have at least two contended spinlocks, with different delays. (spinlocks are not held for the same duration) It clearly defeats your assumption of a single per cpu delay being OK : Some cpus are spinning too long while the lock was released. We might try to use a hash on lock address, and an array of 16 different delays so that different spinlocks have a chance of not sharing the same delay. With following patch, I get 982 Mbits/s with same bench, so an increase of 45 % instead of a 13 % regression. diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 48d2b7d..59f98f6 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -23,6 +23,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/gfp.h> +#include <linux/hash.h> #include <asm/mtrr.h> #include <asm/tlbflush.h> @@ -113,6 +114,55 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1); static bool smp_no_nmi_ipi = false; /* + * Wait on a congested ticket spinlock. + */ +#define MIN_SPINLOCK_DELAY 1 +#define MAX_SPINLOCK_DELAY 1000 +#define DELAY_HASH_SHIFT 4 +DEFINE_PER_CPU(int [1 << DELAY_HASH_SHIFT], spinlock_delay) = { + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY, +}; +void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc) +{ + unsigned int slot = hash_32((u32)(unsigned long)lock, DELAY_HASH_SHIFT); + int delay = __this_cpu_read(spinlock_delay[slot]); + + for (;;) { + int loops = delay * (__ticket_t)(inc.tail - inc.head); + + while (loops--) + cpu_relax(); + + inc.head = ACCESS_ONCE(lock->tickets.head); + + if (inc.head == inc.tail) { + /* Decrease the delay, since we may have overslept. */ + if (delay > MIN_SPINLOCK_DELAY) + delay--; + break; + } + + /* + * The lock is still busy, the delay was not long enough. + * Going through here 2.7 times will, on average, cancel + * out the decrement above. Using a non-integer number + * gets rid of performance artifacts and reduces oversleeping. + */ + if (delay < MAX_SPINLOCK_DELAY && + (!(inc.head & 3) == 0 || (inc.head & 7) == 1)) + delay++; + } + __this_cpu_write(spinlock_delay[slot], delay); +} + +/* * this function sends a 'reschedule' IPI to another CPU. * it goes straight through and wastes no time serializing * anything. Worst case is that we lose a reschedule ... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists