linux-kernel - Re: [PATCH v3] wait: prevent waiter starvation in __wait_on_bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090120203131.GA20985@cmpxchg.org>
Date:	Tue, 20 Jan 2009 21:31:31 +0100
From:	Johannes Weiner <hannes@...xchg.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Matthew Wilcox <matthew@....cx>,
	Chuck Lever <cel@...i.umich.edu>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3] wait: prevent waiter starvation in __wait_on_bit_lock

On Sun, Jan 18, 2009 at 03:32:11AM +0100, Oleg Nesterov wrote:
> On 01/18, Johannes Weiner wrote:
> >
> > On Sat, Jan 17, 2009 at 10:51:10PM +0100, Oleg Nesterov wrote:
> > >
> > > 	if ((ret = (*action)(q->key.flags))) {
> > > 		__wake_up_bit(wq, q->key.flags, q->key.bit_nr);
> > > 		// or just __wake_up(wq, TASK_NORMAL, 1, &q->key);
> > > 		break;
> > > 	}
> > >
> > > IOW, imho __wait_on_bit_lock() is buggy, not __lock_page_killable(),
> > > no?
> >
> > I agree with you, already replied with a patch to linux-mm where Chris
> > posted it originally.
> >
> > Peter noted that we have a spurious wake up in the case where A holds
> > the page lock, B and C wait, B gets killed and does a wake up, then A
> > unlocks and does a wake up.  Your proposal has this problem too,
> > right?
> 
> Yes sure. But I can't see how it is possible to avoid the false
> wakeup for sure, please see below.
> 
> > @@ -182,8 +182,20 @@ __wait_on_bit_lock(wait_queue_head_t *wq
> >  	do {
> >  		prepare_to_wait_exclusive(wq, &q->wait, mode);
> >  		if (test_bit(q->key.bit_nr, q->key.flags)) {
> > -			if ((ret = (*action)(q->key.flags)))
> > +			ret = action(q->key.flags);
> > +			if (ret) {
> > +				/*
> > +				 * Contenders are woken exclusively.  If
> > +				 * we do not take the lock when woken up
> > +				 * from an unlock, we have to make sure to
> > +				 * wake the next waiter in line or noone
> > +				 * will and shkle will wait forever.
> > +				 */
> > +				if (!test_bit(q->key.bit_nr, q->key.flags))
> > +					__wake_up_bit(wq, q->key.flags,
> 
> Afaics, the spurious wake up is still possible if SIGKILL and
> unlock_page() happen "at the same time".
> 
> 	__wait_on_bit_lock:			unlock_page:
> 
> 						clear_bit_unlock()
> 	test_bit() == T
> 
> 	__wake_up_bit()				wake_up_page()
> 
> Note that sync_page_killable() returns with ->state == TASK_RUNNING,
> __wake_up() will "ignore" us.

Yeah, we are not completely out of the woods.  But it's getting
better.  The code is still pretty clear, the major bug is fixed and we
even managed to get the race window for false wake ups pretty small.
Isn't that great? :)

> But, more importantly, I'm afraid we can also have the false negative,
> this "if (!test_bit())" test lacks the barriers. This can't happen with
> sync_page_killable() because it always calls schedule(). But let's
> suppose we modify it to check signal_pending() first:
> 
> 	static int sync_page_killable(void *word)
> 	{
> 		if (fatal_signal_pending(current))
> 			return -EINTR;
> 		return sync_page(word);
> 	}
> 
> It is still correct, but unless I missed something now __wait_on_bit_lock()
> has problems again.

Hm, this would require the lock bit to be set without someone else
doing the wakeup.  How could this happen?

I could think of wake_up_page() happening BEFORE clear_bit_unlock()
and we have to be on the front of the waitqueue.  Then we are already
running, the wake up is a nop, the !test_bit() is false and noone
wakes up the next real contender.

But the wake up side uses a smp barrier after clearing the bit, so if
the bit is not cleared we can expect a wake up, no?

Or do we still need a read-side barrier before the test bit?  Damned
coherency...

> But don't get me wrong, I think you are right and it is better to minimize
> the possibility of the false wakeup.
> 
> Oleg.

Thanks,
	Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/