linux-kernel - Re: [RFC PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANN689Hd01BNYhcBJJBfz7VXJiFM5QikGOLVnokS4Gbq7cT=1g@mail.gmail.com>
Date:	Thu, 3 Jan 2013 04:31:00 -0800
From:	Michel Lespinasse <walken@...gle.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	linux-kernel@...r.kernel.org, aquini@...hat.com,
	eric.dumazet@...il.com, lwoodman@...hat.com, jeremy@...p.org,
	Jan Beulich <JBeulich@...ell.com>,
	Thomas Gleixner <tglx@...utronix.de>, knoel@...hat.com
Subject: Re: [RFC PATCH 3/5] x86,smp: auto tune spinlock backoff delay factor

On Wed, Jan 2, 2013 at 9:23 PM, Rik van Riel <riel@...hat.com> wrote:
> Proportional spinlock delay with a high delay factor works well
> when there is lots contention on a lock. Likewise, a smaller
> delay factor works well when a lock is lightly contended.
>
> Making the code auto-tune the delay factor results in a system
> that performs well with both light and heavy lock contention.

I don't quite like that part of the explanation - Ideally I would like
to explain that there are huge benefits in having the delay be at
least equivalent to the time taken by a no-load spin/release section,
and that beyond that there are small additional benefits to using a
longer delay when the spinlock hold times are larger, and an
explanation of why linear increase / exponential delay works nicely
here.

I'll see if I can make a more concrete proposal and still keep it
short enough :)

> +#define MIN_SPINLOCK_DELAY 1
> +#define MAX_SPINLOCK_DELAY 16000
> +DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };

unsigned would seem more natural here, though it's only a tiny detail

>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
>         __ticket_t head = inc.head, ticket = inc.tail;
>         __ticket_t waiters_ahead;
> +       int delay = __this_cpu_read(spinlock_delay);

I like that you used __this_cpu_read() - your v1 version wasn't as nice.

> +
> +               /*
> +                * The lock is still busy; slowly increase the delay. If we
> +                * end up sleeping too long, the code below will reduce the
> +                * delay. Ideally we acquire the lock in the tight loop above.
> +                */
> +               if (!(head % 7) && delay < MAX_SPINLOCK_DELAY)
> +                       delay++;
> +
> +               loops = delay * waiters_ahead;

I don't like the head % 7 thing. I think using fixed point arithmetic
would be nicer:

if (delay < MAX_SPINLOCK_DELAY)
  delay += 256/7; /* Or whatever constant we choose */

loops = (delay * waiter_ahead) >> 8;

Also, we should probably skip the delay increment on the first loop
iteration - after all, we haven't waited yet, so we can't say that the
delay was too short.

> -               if (head == ticket)
> +               if (head == ticket) {
> +                       /*
> +                        * We overslept and have no idea how long the lock
> +                        * went idle. Reduce the delay as a precaution.
> +                        */
> +                       delay -= delay/32 + 1;

There is a possibility of integer underflow here. It seems that the
window to hit it would be quite small, but it can happen (at the very
least, imagine getting an interrupt at an inopportune time which would
make it look like you overslept). I think this may be what was causing
the stability issues I noticed.

I'll try to come up with a custom version of this patch. I do feel
that we're going in a good direction in general, though.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/