linux-kernel - Re: [patch V3 56/64] futex: Correct the number of requeued waiters for PI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87bl66ubzd.ffs@tglx>
Date:   Mon, 09 Aug 2021 12:52:06 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Davidlohr Bueso <dave@...olabs.net>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Mike Galbraith <efault@....de>
Subject: Re: [patch V3 56/64] futex: Correct the number of requeued waiters
 for PI

On Mon, Aug 09 2021 at 10:18, Thomas Gleixner wrote:
> On Sun, Aug 08 2021 at 10:05, Davidlohr Bueso wrote:
>> On Thu, 05 Aug 2021, Thomas Gleixner wrote:
>>
>>>From: Thomas Gleixner <tglx@...utronix.de>
>>>
>>>The accounting is wrong when either the PI sanity check or the
>>>requeue PI operation fails. Adjust it in the failure path.
>>
>> Ok fortunately these accounting errors are benign considering they
>> are in error paths. This also made me wonder about the requeue PI
>> top-waiter wakeup from futex_proxy_trylock_atomic(), which is always
>> required with nr_wakers == 1. We account for it on the successful
>> case we acquired the lock on it's behalf (and thus requeue_pi_wake_futex
>> was called), but if the corresponding lookup_pi_state fails, we'll retry.
>> So, shouldn't the task_count++ only be considered when we know the
>> requeueing is next (a successful top_waiter acquiring the lock+pi state)?
>>
>> @@ -2260,7 +2260,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> 		 */
>> 		if (ret > 0) {
>> 			WARN_ON(pi_state);
>> -                       task_count++;
>> 			/*
>> 			 * If we acquired the lock, then the user space value
>> 			 * of uaddr2 should be vpid. It cannot be changed by
>> @@ -2275,6 +2274,8 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
>> 			 */
>> 			ret = lookup_pi_state(uaddr2, ret, hb2, &key2,
>> 					      &pi_state, &exiting);
>> +                       if (!ret)
>> +                               task_count++;
>> 		}
>
> Yes, but if futex_proxy_trylock_atomic() succeeds and lookup_pi_state()
> fails then the user space futex value is already the VPID of the top
> waiter and a retry will then fail the futex_proxy_trylock_atomic().

Actually lookup_pi_state() cannot fail here.

If futex_proxy_trylock_atomic() takes the user space futex then there
are no waiters on futex2 and the task for which the proxy trylock
acquired futex2 in the user space storage cannot be exiting because it's
still enqueued on futex1.

That means lookup_pi_state() will take the attach_to_pi_owner() path and
that will succeed because VPID belongs to an alive task.

What's wrong in that code though is the condition further up:

	if (requeue_pi && (task_count - nr_wake < nr_requeue)) {

nr_wake has to be 1 for requeue PI. For the first round task_count is 0
which means the condition is true for any value of nr_requeue >= 0.

It does not really matter because none of the code below that runs the
retry path, but it's at least confusing as hell.

Let me fix all of that muck.

Thanks,

        tglx