lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 08 Aug 2018 18:50:06 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     "J. Bruce Fields" <bfields@...ldses.org>,
        NeilBrown <neilb@...e.com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Martin Wilck <mwilck@...e.de>
Subject: Re: [PATCH 0/4] locks: avoid thundering-herd wake-ups

On Wed, 2018-08-08 at 17:28 -0400, J. Bruce Fields wrote:
> On Wed, Aug 08, 2018 at 04:09:12PM -0400, J. Bruce Fields wrote:
> > On Wed, Aug 08, 2018 at 03:54:45PM -0400, J. Bruce Fields wrote:
> > > On Wed, Aug 08, 2018 at 11:51:07AM +1000, NeilBrown wrote:
> > > > If you have a many-core machine, and have many threads all wanting to
> > > > briefly lock a give file (udev is known to do this), you can get quite
> > > > poor performance.
> > > > 
> > > > When one thread releases a lock, it wakes up all other threads that
> > > > are waiting (classic thundering-herd) - one will get the lock and the
> > > > others go to sleep.
> > > > When you have few cores, this is not very noticeable: by the time the
> > > > 4th or 5th thread gets enough CPU time to try to claim the lock, the
> > > > earlier threads have claimed it, done what was needed, and released.
> > > > With 50+ cores, the contention can easily be measured.
> > > > 
> > > > This patchset creates a tree of pending lock request in which siblings
> > > > don't conflict and each lock request does conflict with its parent.
> > > > When a lock is released, only requests which don't conflict with each
> > > > other a woken.
> > > 
> > > Are you sure you aren't depending on the (incorrect) assumption that "X
> > > blocks Y" is a transitive relation?
> > > 
> > > OK I should be able to answer that question myself, my patience for
> > > code-reading is at a real low this afternoon....
> > 
> > In other words, is there the possibility of a tree of, say, exclusive
> > locks with (offset, length) like:
> > 
> > 	(0, 2) waiting on (1, 2) waiting on (2, 2) waiting on (0, 4)
> > 
> > and when waking (0, 4) you could wake up (2, 2) but not (0, 2), leaving
> > a process waiting without there being an actual conflict.
> 
> After batting it back and forth with Jeff on IRC....  So do I understand
> right that when we wake a waiter, we leave its own tree of waiters
> intact, and when it wakes if it finds a conflict it just adds it lock
> (with tree of waiters) in to the tree of the conflicting lock?
> 
> If so then yes I think that depends on the transitivity
> assumption--you're assuming that finding a conflict between the root of
> the tree and a lock proves that all the other members of the tree also
> conflict.
> 
> So maybe this example works.  (All locks are exclusive and written
> (offset, length), X->Y means X is waiting on Y.)
> 
> 	process acquires (0,3)
> 	2nd process requests (1,2), is put to sleep.
> 	3rd process requests (0,2), is put to sleep.
> 
> 	The tree of waiters now looks like (0,2)->(1,2)->(0,3)
> 
> 	(0,3) is unlocked.
> 	A 4th process races in and locks (2,2).
> 	The 2nd process wakes up, sees this new conflict, and waits on
> 	(2,2).  Now the tree looks like (0,2)->(1,2)->(2,2), and (0,2)
> 	is waiting for no reason.
> 

That seems like a legit problem.

One possible fix might be to have the waiter on (1,2) walk down the
entire subtree and wake up any waiter that is waiting on a lock that
doesn't conflict with the lock on which it's waiting.

So, before the task waiting on 1,2 goes back to sleep to wait on 2,2, it
could walk down its entire fl_blocked subtree and wake up anything
waiting on a lock that doesn't conflict with (2,2).

That's potentially an expensive operation, but:

a) the task is going back to sleep anyway, so letting it do a little
extra work before that should be no big deal

b) it's probably still cheaper than waking up the whole herd

-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists