linux-kernel - Re: [PATCH] rtmutex: ensure we wake up the top waiter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAq0SUkN38V00HqV3Hk3ee_-=vfkKxG9xtR3n=4gAT+zCs+=Zg@mail.gmail.com>
Date:   Wed, 18 Jan 2023 15:49:37 -0300
From:   Wander Lairson Costa <wander@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "open list:LOCKING PRIMITIVES" <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH] rtmutex: ensure we wake up the top waiter

On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Wander!
>
> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote:
> > In task_blocked_on_lock() we save the owner, release the wait_lock and
> > call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock
> > again, the owner may release the lock and deboost.
>
> This does not make sense in several aspects:
>
>   1) Who is 'we'? You, me, someone else? None of us does anything of the
>      above.
>
>         https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
>
>   2) What has task_blocked_on_lock() to do with the logic in
>      rt_mutex_adjust_prio_chain() which is called by other callsites
>      too?
>
>   3) If the owner releases the lock and deboosts then this has
>      absolutely nothing to do with the lock because the priority of a
>      the owner is determined by its own priority and the priority of the
>      top most waiter. If the owner releases the lock then it marks the
>      lock ownerless, wakes the top most waiter and deboosts itself. In
>      this owner deboost rt_mutex_adjust_prio_chain() is not involved at
>      all. Why?
>
>      Because the owner deboost does not affect the priority of the
>      waiters at all. It's the other way round: Waiter priority affects
>      the owner priority if the waiter priority is higher than the owner
>      priority.
>
> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
> > phase, waiter may be initially in the top of the queue, but after
> > dequeued and requeued it may no longer be true.
>
> That's related to your above argumentation in which way?
>

I think I made the mistake of not explicitly saying at least three
tasks are involved:

- A Task T1 that currently holds the mutex
- A Task T2 that is the top waiter
- A Task T3 that changes the top waiter

T3 tries to acquire the mutex, but as T1 holds it, it calls
task_blocked_on_lock() and saves the owner. It eventually calls
rt_mutex_adjust_prio_chain(), but it releases the wait lock before
doing so. This opens a window for T1 to release the mutex and wake up
T2. Before T2 runs, T3 acquires the wait lock again inside
rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code
changes the top waiter, then 1) When T2 runs, it will verify that it
is no longer the top waiter and comes back to sleep 2) As you observed
below, the waiter doesn't point to the top waiter and, therefore, it
will wake up the wrong task.


> rt_mutex_adjust_prio_chain()
>
>         lock->wait_lock is held across the whole operation
>
>         prerequeue_top_waiter = rt_mutex_top_waiter(lock);
>
>   This saves the current top waiter before the dequeue()/enqueue()
>   sequence.
>
>         rt_mutex_dequeue(lock, waiter);
>         waiter_update_prio(waiter, task);
>         rt_mutex_enqueue(lock, waiter);
>
>         if (!rt_mutex_owner(lock)) {
>
>   This is the case where the lock has no owner, i.e. the previous owner
>   unlocked and the chainwalk cannot be continued.
>
>   Now the code checks whether the requeue changed the top waiter task:
>
>                 if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
>
>   What can make this condition true?
>
>     1) @waiter is the new top waiter due to the requeue operation
>
>     2) @waiter is not longer the top waiter due to the requeue operation
>
>   So in both cases the new top waiter must be woken up so it can take over
>   the ownerless lock.
>
>   Here is where the code is buggy. It only considers case #1, but not
>   case #2, right?
>
> So your patch is correct, but the explanation in your changelog has
> absolutely nothing to do with the problem.
>
> Why?
>
>   #2 is caused by a top waiter dropping out due to a signal or timeout
>      and thereby deboosting the whole lock chain.
>
>   So the relevant callchain which causes the problem originates from
>   remove_waiter()
>
> See?
>

Another piece of information I forgot: I spotted the bug in the
spinlock_rt, which uses a rtmutex under the hood. It has a different
code path in the lock scenario, and there is no call to
remove_waiter() (or I am missing something).
Anyway, you summed it up pretty well here: "@waiter is no longer the
top waiter due to the requeue operation". I tried (and failed) to
explain the call chain that ends up in the buggy scenario, but now I
think I should just describe the fundamental problem (the waiter
doesn't point to the top waiter).

> Thanks,
>
>         tglx
>