lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Apr 2014 09:20:42 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jan Stancek <jstancek@...hat.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Davidlohr Bueso <davidlohr@...com>,
	Ingo Molnar <mingo@...nel.org>,
	Larry Woodman <lwoodman@...hat.com>
Subject: Re: [PATCH] futex: avoid race between requeue and wake

Davidlohr, comments?

On Tue, Apr 8, 2014 at 1:47 AM, Jan Stancek <jstancek@...hat.com> wrote:
> pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
> occasionally fails, because some threads fail to wake up.

Jan, I _assume_ this is on x86(-64), but can you please confirm?

Because if it's on anything else, the whole situation changes.

> Taking hb->lock in this situation will ensure that thread A needs to wait
> in futex_wake() until main thread finishes requeue operation.

So the argument was that doing *both* spin_is_locked() and
atomic_read(&hb->waiters) _should_ be unnecessary, because
hb_waiters_inc() is done *before* getting the spinlock

However, one exception to this is "requeue_futex()". Which is in fact
the test-case that Jan points to. There, when we move a futex from one
hash bucket to another, we do the increment inside the spinlock.

So I think the change is correct, although the commit message might
need a bit of improvement. I also hate that "if/else" thing, since
there's no point in an "else" if the if-statement did a "return". So
either make it just

    if (spin_is_locked(&hb->lock))
        return 1;
    return atomic_read(&hb->waiters);

or (perhaps preferably) make it

    return spin_is_locked(&hb->lock) || atomic_read(&hb->waiters);

but that "if/else" just makes me go "why?".

But I'd also like to have Davidlohr look at this, because I have a few
questions:

 - how did this never show up in the original loads? No requeueing
under those test-loads?

 - would we be better off incrementing the waiter count at the top of
futex_requeue(), at the retry_private label?

That would make us follow the "has to be incremented before taking the
lock" rule, but at the expense of making the error case handling more
complex. Although maybe we could do it as part of
"double_lock/unlock_hb()" and just mark both hb1/hb2 busy?

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ