linux-kernel - Re: [PATCH] rtmutex: ensure we wake up the top waiter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230131175350.s7eiz55fozlhaegh@fedora>
Date:   Tue, 31 Jan 2023 14:53:50 -0300
From:   Wander Lairson Costa <wander@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "open list:LOCKING PRIMITIVES" <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH] rtmutex: ensure we wake up the top waiter

On Tue, Jan 31, 2023 at 02:46:19PM -0300, Wander Lairson Costa wrote:
> On Thu, Jan 19, 2023 at 5:54 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > Wander!
> >
> > On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote:
> > > On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@...utronix.de> wrote:
> > >> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote:
> > >> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
> > >> > phase, waiter may be initially in the top of the queue, but after
> > >> > dequeued and requeued it may no longer be true.
> > >>
> > >> That's related to your above argumentation in which way?
> > >>
> > >
> > > I think I made the mistake of not explicitly saying at least three
> > > tasks are involved:
> > >
> > > - A Task T1 that currently holds the mutex
> > > - A Task T2 that is the top waiter
> > > - A Task T3 that changes the top waiter
> > >
> > > T3 tries to acquire the mutex, but as T1 holds it, it calls
> > > task_blocked_on_lock() and saves the owner. It eventually calls
> > > rt_mutex_adjust_prio_chain(), but it releases the wait lock before
> > > doing so. This opens a window for T1 to release the mutex and wake up
> > > T2. Before T2 runs, T3 acquires the wait lock again inside
> > > rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code
> > > changes the top waiter, then 1) When T2 runs, it will verify that it
> > > is no longer the top waiter and comes back to sleep 2) As you observed
> > > below, the waiter doesn't point to the top waiter and, therefore, it
> > > will wake up the wrong task.
> >
> > This is still confusing as hell because the wait locks you are talking
> > about belong to different rtmutexes. So there is no drops wait lock and
> > reacquires wait lock window.
> >
> > There must be (multiple) lock chains involved, but I couldn't figure out
> > yet what the actaul scenario is in the case of a pure rt_spinlock clock
> > chain:
> >
> 
> Sorry, it took so long to reply. I didn't have the traces anymore and
> had to regenerate them. You spotted an error in my analysis, then I
> had to start over.
> 
> > > Another piece of information I forgot: I spotted the bug in the
> > > spinlock_rt, which uses a rtmutex under the hood. It has a different
> > > code path in the lock scenario, and there is no call to
> > > remove_waiter() (or I am missing something).
> >
> > Correct. But this still might be a lock chain issue where a non
> > rt_spinlock which allows early removal.
> >
> > > Anyway, you summed it up pretty well here: "@waiter is no longer the
> > > top waiter due to the requeue operation". I tried (and failed) to
> > > explain the call chain that ends up in the buggy scenario, but now I
> > > think I should just describe the fundamental problem (the waiter
> > > doesn't point to the top waiter).
> >
> > You really want to provide the information WHY this case can happen at
> > all. If it's not the removal case and related to some other obscure lock
> > chain problem, then we really need to understand the scenario because
> > that lets us analyze whether there are other holes we did not think
> > about yet.
> >
> > If you have traces which show the sequence of lock events leading to
> > this problem, then you should be able to decode the scenario. If you
> > fail to extract the information, then please provide the traces so we
> > can stare at them.
> >
> 
> Here we go:
> 
> Let L1 and L2 be two spinlocks.
> 
> Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
> waiter of L2.
> 
> Let T2 be the task holding L2.
> 
> Let T3 be a task trying to acquire L1.
> 
> The following events will lead to a state in which the wait queue of L2
> isn't empty but nobody holds it.
> 
> T1                              T2                              T3
>                                                                 spin_lock(L1)
> 
> raw_spin_lock(L1->wait_lock)
> 
> rtlock_slowlock_locked(L1)
> 
> task_blocks_on_rt_mutex(L1, T3)
> 
> orig_waiter->lock = L1
> 
> orig_waiter->task = T3
> 
> raw_spin_unlock(L1->wait_lock)
> 
> rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
> 
>                                 spin_unlock(L2)
>                                   rt_mutex_slowunlock(L2)
>                                     raw_spin_lock(L2->wait_lock)
>                                     wakeup(T1)
>                                     raw_spin_unlock(L2->wait_lock)
> 
>        waiter = T1->pi_blocked_on
> 
>        waiter == rt_mutex_top_waiter(L2)
> 
>        waiter->task == T1
> 
>        raw_spin_lock(L2->wait_lock)
> 
>        dequeue(L2, waiter)
> 
>        update_prio(waiter, T1)
> 
>        enqueue(L2, waiter)
> 
>        waiter != rt_mutex_top_waiter(L2)
> 
>        L2->owner == NULL
> 
>        wakeup(T1)
> 
>        raw_spin_unlock(L2->wait_lock)
> T1 wakes up
> T1 != top_waiter(L2)
> schedule_rtlock()
> 

Arrrrghhhh, s**t mail servers... Hopefully now it is formatted correctly:

Let L1 and L2 be two spinlocks.

Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.

Let T2 be the task holding L2.

Let T3 be a task trying to acquire L1.

The following events will lead to a state in which the wait queue of L2
isn't empty but nobody holds it.

T1                              T2                              T3
                                                                spin_lock(L1)
                                                                  raw_spin_lock(L1->wait_lock)
                                                                  rtlock_slowlock_locked(L1)
                                                                    task_blocks_on_rt_mutex(L1, T3)
                                                                      orig_waiter->lock = L1
                                                                      orig_waiter->task = T3
                                                                      raw_spin_unlock(L1->wait_lock)
                                                                      rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)

                                spin_unlock(L2)
                                  rt_mutex_slowunlock(L2)
                                    raw_spin_lock(L2->wait_lock)
                                    wakeup(T1)
                                    raw_spin_unlock(L2->wait_lock)
                                                                              waiter = T1->pi_blocked_on
                                                                              waiter == rt_mutex_top_waiter(L2)
                                                                              waiter->task == T1
                                                                              raw_spin_lock(L2->wait_lock)
                                                                              dequeue(L2, waiter)
                                                                              update_prio(waiter, T1)
                                                                              enqueue(L2, waiter)
                                                                              waiter != rt_mutex_top_waiter(L2)
                                                                              L2->owner == NULL
                                                                              wakeup(T1)
                                                                              raw_spin_unlock(L2->wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()