linux-kernel - Re: [RFC PATCH 4/5] x86,smp: keep spinlock delay values per hashed spinlock address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1357218343.21409.24347.camel@edumazet-glaptop>
Date:	Thu, 03 Jan 2013 05:05:43 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Michel Lespinasse <walken@...gle.com>
Cc:	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
	aquini@...hat.com, lwoodman@...hat.com, jeremy@...p.org,
	Jan Beulich <JBeulich@...ell.com>,
	Thomas Gleixner <tglx@...utronix.de>, knoel@...hat.com
Subject: Re: [RFC PATCH 4/5] x86,smp: keep spinlock delay values per hashed
 spinlock address

On Thu, 2013-01-03 at 04:48 -0800, Michel Lespinasse wrote:
> On Wed, Jan 2, 2013 at 9:24 PM, Rik van Riel <riel@...hat.com> wrote:
> > From: Eric Dumazet <eric.dumazet@...il.com>
> >
> > Eric Dumazet found a regression with the spinlock backoff code,
> > in workloads where multiple spinlocks were contended, each having
> > a different wait time.
> 
> I think you should really clarify that the regression was observed
> with version 1 of your proposal. At that time,
> 
> 1- the autotune code tended to use too long delays for long held locks, and
> 
> 2- there was no exponential backoff, which meant that sharing stats
> between a long held and a short held spinlock could really hurt
> throughput on the short held spinlock
> 
> 
> I believe that with autotune v2, this really shouldnt be a problem and
> stats sharing should result in just using a delay that's appropriate
> for the shorter of the two lock hold times - which is not the optimal
> value, but is actually close enough performance wise and, most
> importantly, should not cause any regression when compared to current
> mainline.
> 
> (it's important to point that out because otherwise, you might trick
> Linus into thinking your patches are risky, which I think they
> shouldn't be after you implemented exponential backoff)
> 
> >  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
> >  {
> >         __ticket_t head = inc.head, ticket = inc.tail;
> >         __ticket_t waiters_ahead;
> > -       int delay = __this_cpu_read(spinlock_delay);
> > +       u32 hash = hash32_ptr(lock);
> > +       u32 slot = hash_32(hash, DELAY_HASH_SHIFT);
> > +       struct delay_entry *ent = &__get_cpu_var(spinlock_delay[slot]);
> > +       u32 delay = (ent->hash == hash) ? ent->delay : MIN_SPINLOCK_DELAY;
> 
> IMO we want to avoid MIN_SPINLOCK_DELAY here. The exponential backoff
> autotune should make us resilient to collisions (if they happen we'll
> just end up with something very close to the min of the delays that
> would have been appropriate for either locks), so it should be better
> to just let collisions happen rather than force the use of
> MIN_SPINLOCK_DELAY.
> 

exponential backoff wont help, I tried this idea last week and found
that its better to detect hash collision and safely use
MIN_SPINLOCK_DELAY in this case.

Its better to not overestimate the delay and spin much longer than
needed.

On a hash collision, we dont know at all the contention history of this
lock, unless we store the EWMA delay inside the lock.

(On x86 and NR_CPUS <= 256, we have a 16 bit hole in the spinlock that
 we could use for this)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/