linux-kernel - [PATCH v3] locking/rtmutex: Always use trylock in rt_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241007152910.29457-1-longman@redhat.com>
Date: Mon,  7 Oct 2024 11:29:10 -0400
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Will Deacon <will@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>
Cc: linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Luis Goncalves <lgoncalv@...hat.com>,
	Chunyu Hu <chuhu@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: [PATCH v3] locking/rtmutex: Always use trylock in rt_mutex_trylock()

One reason to use a trylock is to avoid a ABBA deadlock in case we need
to use a locking sequence that is not in the expected locking order. That
requires the use of trylock all the ways in the abnormal locking
sequence. Unfortunately, this is not the case for rt_mutex_trylock() as
it uses a raw_spin_lock_irqsave() to acquire the lock->wait_lock.

There are just a few rt_mutex_trylock() call sites in the stock kernel.
For PREEMPT_RT kernel, however, all the spin_trylock() calls become
rt_mutex_trylock(). There are a few hundreds of them. So it will be a lot
easier to trigger a circular locking lockdep splat.

One particular instance where a circular locking lockdep splat happens
is for a v6.11 debug kernel with KASAN enabled.

   sched_tick() [ acquire rq->lock ]
   -> task_work_add()
    -> __kasan_record_aux_stack()
     -> kasan_save_stack()
      -> stack_depot_save_flags()
       -> alloc_pages_mpol_noprof()
        -> __alloc_pages_noprof()
	 -> get_page_from_freelist()
	  -> rmqueue()
	   -> rmqueue_pcplist()
	    -> rt_spin_trylock()
	     -> rt_mutex_slowtrylock()
	      -> _raw_spin_lock_irqsave() [ acquire wait_lock ]

[   63.695462] ======================================================
[   63.695464] WARNING: possible circular locking dependency detected
[   63.695467] 6.11.0-0.rc5.22.el10.x86_64+rt-debug #1 Not tainted
[   63.695470] ------------------------------------------------------
[   63.695473] modprobe/610 is trying to acquire lock:
[   63.695476] ff110007e9613058 (&lock->wait_lock){-...}-{2:2}, at: rt_mutex_slowtrylock+0x3f/0xb0
[   63.695495]
[   63.695495] but task is already holding lock:
[   63.695497] ff110007e96096d8 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0xc0

The rtmutex API has the locking dependency "&lock->wait_lock -->
&p->pi_lock". The scheduler code has the locking dependency "&p->pi_lock
--> &rq->__lock". As a result, there is a circular lock dependency.

[   63.695842] Chain exists of:
[   63.695842]   &lock->wait_lock --> &p->pi_lock --> &rq->__lock
[   63.695842]
[   63.695850]  Possible unsafe locking scenario:
[   63.695850]
[   63.695851]        CPU0                    CPU1
[   63.695852]        ----                    ----
[   63.695854]   lock(&rq->__lock);
[   63.695857]                                lock(&p->pi_lock);
[   63.695861]                                lock(&rq->__lock);
[   63.695864]   lock(&lock->wait_lock);
[   63.695868]
[   63.695868]  *** DEADLOCK ***

Fix it by using raw_spin_trylock_irqsave() in rt_mutex_slowtrylock()
instead.

Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Signed-off-by: Waiman Long <longman@...hat.com>
Reviewed-by: Thomas Gleixner <tglx@...utronix.de>
---
 kernel/locking/rtmutex.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ebebd0eec7f6..a32bc2bb5d5e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1381,10 +1381,13 @@ static int __sched rt_mutex_slowtrylock(struct rt_mutex_base *lock)
 		return 0;
 
 	/*
-	 * The mutex has currently no owner. Lock the wait lock and try to
-	 * acquire the lock. We use irqsave here to support early boot calls.
+	 * The mutex has currently no owner. Try to lock the wait lock first.
+	 * If successful, try to acquire the lock. We use irqsave here to
+	 * support early boot calls. Trylock is used all the way to avoid
+	 * circular lock dependency.
 	 */
-	raw_spin_lock_irqsave(&lock->wait_lock, flags);
+	if (!raw_spin_trylock_irqsave(&lock->wait_lock, flags))
+		return 0;
 
 	ret = __rt_mutex_slowtrylock(lock);
 
-- 
2.46.2