linux-kernel - Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in __wait_on_bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 2 Sep 2016 14:06:02 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Al Viro <viro@...IV.linux.org.uk>,
        Bart Van Assche <bvanassche@....org>,
        Johannes Weiner <hannes@...xchg.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Neil Brown <neilb@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/wait: avoid abort_exclusive_wait() in
        __wait_on_bit_lock()

On 09/01, Peter Zijlstra wrote:
>
> On Fri, Aug 26, 2016 at 02:45:52PM +0200, Oleg Nesterov wrote:
>
> > We do not need anything tricky to avoid the race,
>
> The race being:
>
> CPU0			CPU1			CPU2
>
> 			__wait_on_bit_lock()
> 			  bit_wait_io()
> 			    io_schedule()
>
> clear_bit_unlock()
> __wake_up_common(.nr_exclusive=1)
>   list_for_each_entry()
>     if (curr->func() && --nr_exclusive)
>       break
>
> 						signal()
>
> 			    if (signal_pending_state()) == TRUE
> 			      return -EINTR
>
> And no progress because CPU1 exits without acquiring the lock and CPU0
> thinks its done because it woke someone.

Yes,

> > we can just call finish_wait() if action() fails.
>
> That would be bit_wait*() returning -EINTR because sigpending.

Hmm. Not sure I understand... Let me reply just in case, even if
I am sure you get it right.

Yes, in the likely case we are going to fail with -EINTR, but only
if test-and-set after thar fails.

> Sure, you can always call that, first thing through the loop does
> prepare again, so no harm. That however does not connect to your
> condition,.. /me puzzled

If ->action() fails we will abort the loop in any case, prepare
won't be called. So in this case finish_wait() does the right thing.

> > test_and_set_bit() implies mb() so
> > the lockless list_empty_careful() case is fine, we can not miss the
> > condition if we race with unlock_page().
>
> You're talking about this ordering?:
>
> 	finish_wait()			clear_bit_unlock();
> 	  list_empty_careful()
>
> 	/* MB implied */		smp_mb__after_atomic();
> 	test_and_set_bit()		wake_up_page()
> 					  ...
> 					    autoremove_wake_function()
> 					      list_del_init();
>
>
> That could do with spelling out I feel.. :-)

Yes, yes.

> >  __wait_on_bit_lock(wait_queue_head_t *wq, struct wait_bit_queue *q,
> >  			wait_bit_action_f *action, unsigned mode)
> >  {
> > +	int ret = 0;
> >
> > +	for (;;) {
> >  		prepare_to_wait_exclusive(wq, &q->wait, mode);
> > +		if (test_bit(q->key.bit_nr, q->key.flags)) {
> > +			ret = action(&q->key, mode);
> > +			/*
> > +			 * Ensure that clear_bit() + wake_up() right after
> > +			 * test_and_set_bit() below can't see us; it should
> > +			 * wake up another exclusive waiter if we fail.
> > +			 */
> > +			if (ret)
> > +				finish_wait(wq, &q->wait);
> > +		}
> > +		if (!test_and_set_bit(q->key.bit_nr, q->key.flags)) {
>
> So this is the actual difference, instead of failing the lock and
> aborting on signal, we acquire the lock if possible. If its not
> possible, someone else has it, which guarantees that someone else will
> do an unlock which implies another wakeup and life goes on.

Yes. This way we eliminate the need for the additional wake_up.

Oleg.