lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Jun 2008 05:24:04 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	David Howells <dhowells@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 0/3] 64-bit futexes: Intro

On Tue, Jun 03, 2008 at 01:03:17AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > > That bit can be used as a lock and if all access to the state of 
> > > that atomic variable uses it, arbitrary higher-order atomic state 
> > > transitions can be derived from it. The cost would be a bit more 
> > > instructions in the fastpath, but there would still only be a single 
> > > atomic op (the acquire op), as the unlock would be a natural barrier 
> > > (on x86 at least).
> > 
> > No, "unlocks as a natural barrier" only works for exclusive kernel 
> > locks (spin_unlock and write_unlock). There we can just do a write to 
> > unlock. But for anything that wants to handle contention differently 
> > than just spinning, the unlock path needs to be able to do an atomic 
> > "unlock and test if I need to do something else", because it may need 
> > to wake things up.
> 
> yeah, indeed. Compared to all the other costs that have to be dealt with 
> here, having a second atomic op isnt all that much of an issue either, 
> especially on latest hw. An atomic op will probably never be as cheap as 
> a non-atomic op, but ~20 cycles is still plenty fast for most practical 
> purposes.

I think optimised our unlock_page in a way that it can do a
non-atomic unlock in the fastpath (no waiters) using 2 bits. In
practice it was still atomic but only because other page flags
operations could operate on ->flags at the same time.

I still have to get around to submitting the damn thing upstream,
but maybe if I bring it up here, I get the idea more reviewed :)

Anyway, the algorithm goes like this:

lock_page()
{
  if (test_and_set_bit_lock(PG_locked))
    lock_page_slow();
}

lock_page_slow()
{
  /* slowpath */
again:
  prepare_to_wait();
  set_bit(PG_waiters);
  if (test_and_set_bit_lock(PG_locked)) {
    schedule();
    goto again;
  }
}

wake_page_waiters()
{
  /* slowpath */
  clear_bit(PG_waiters);
  smp_mb__after_clear_bit(); /* order clear_bit store with wake_up_page load */
  wake_up_page(PG_locked);
}

unlock_page()
{
  clear_bit_unlock(PG_locked);
  if (unlikely(test_bit(PG_waiters)))
    wake_page_waiters();
}

We don't require any load/store barrier in the unlock_page fastpath
because the bits are in the same word, so cache coherency gives us a
sequential ordering anyway.

Now you notice the lock page slowpath can set bits without holding the
lock, at first glance you'd think the clear_bit to unlock has to be
atomic too then. But actually if we're careful, we can put them in seperate
parts of the word and use the sub-word operations on x86 to avoid the atomic
requirement. I'm not aware of any architecture in which operations to the
same word could be out of order.

Not only does this avoid all barriers (except acquire/release) in the
fastpaths, but it also avoids the unconditional load of the random
hash page waitqueue cacheline on unlock.

Anything applicable to your problem at hand?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ