lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20240926151315.507905-1-longman@redhat.com>
Date: Thu, 26 Sep 2024 11:13:15 -0400
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Will Deacon <will@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>
Cc: linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Luis Goncalves <lgoncalv@...hat.com>,
	Chunyu Hu <chuhu@...hat.com>,
	Waiman Long <longman@...hat.com>
Subject: [PATCH] locking/rtmutex: Always use trylock in rt_mutex_trylock()

One reason to use a trylock is to avoid a ABBA deadlock in case we need
to use a locking sequence that is not in the expected locking order. That
requires the use of trylock all the ways in the abnormal locking
sequence. Unfortunately, this is not the case for rt_mutex_trylock() as
it uses a raw_spin_lock_irqsave() to acquire the lock->wait_lock. That
will cause a lockdep splat like the following in a PREEMPT_RT kernel:

[   63.695668] -> #0 (&lock->wait_lock){-...}-{2:2}:
[   63.695674]        check_prev_add+0x1bd/0x1310
[   63.695678]        validate_chain+0x6cf/0x7c0
[   63.695682]        __lock_acquire+0x879/0xf40
[   63.695686]        lock_acquire.part.0+0xfa/0x2d0
[   63.695690]        _raw_spin_lock_irqsave+0x46/0x90
[   63.695695]        rt_mutex_slowtrylock+0x3f/0xb0
[   63.695699]        rt_spin_trylock+0x13/0xc0
[   63.695702]        rmqueue_pcplist+0x5b/0x180
[   63.695705]        rmqueue+0xb01/0x1040
[   63.695708]        get_page_from_freelist+0x1d0/0xa00
[   63.695712]        __alloc_pages_noprof+0x19a/0x450
[   63.695716]        alloc_pages_mpol_noprof+0xaf/0x1e0
[   63.695721]        stack_depot_save_flags+0x4db/0x520
[   63.695727]        kasan_save_stack+0x3f/0x50
[   63.695731]        __kasan_record_aux_stack+0x8e/0xa0
[   63.695736]        task_work_add+0x1ad/0x240
[   63.695741]        sched_tick+0x1c7/0x500
[   63.695744]        update_process_times+0xf1/0x130
[   63.695750]        tick_nohz_handler+0xf7/0x230
[   63.695754]        __hrtimer_run_queues+0x13b/0x7b0
[   63.695758]        hrtimer_interrupt+0x1c2/0x350
[   63.695763]        __sysvec_apic_timer_interrupt+0xdb/0x340
[   63.695770]        sysvec_apic_timer_interrupt+0x9c/0xd0
[   63.695774]        asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   63.695780]        __asan_load8+0x8/0xa0
[   63.695784]        mas_wr_end_piv+0x28/0x2c0
[   63.695789]        mas_preallocate+0x3a8/0x680
[   63.695793]        vma_shrink+0x180/0x3f0
[   63.695799]        shift_arg_pages+0x219/0x2c0
[   63.695804]        setup_arg_pages+0x343/0x5e0
[   63.695807]        load_elf_binary+0x5ac/0x15d0
[   63.695813]        search_binary_handler+0x125/0x370
[   63.695818]        exec_binprm+0xc9/0x3d0
[   63.695821]        bprm_execve+0x103/0x230
[   63.695824]        kernel_execve+0x187/0x1c0
[   63.695828]        call_usermodehelper_exec_async+0x145/0x1e0
[   63.695832]        ret_from_fork+0x31/0x60
[   63.695836]        ret_from_fork_asm+0x1a/0x30
[   63.695840]
[   63.695840] other info that might help us debug this:
[   63.695840]
[   63.695842] Chain exists of:
[   63.695842]   &lock->wait_lock --> &p->pi_lock --> &rq->__lock
[   63.695842]
[   63.695850]  Possible unsafe locking scenario:
[   63.695850]
[   63.695851]        CPU0                    CPU1
[   63.695852]        ----                    ----
[   63.695854]   lock(&rq->__lock);
[   63.695857]                                lock(&p->pi_lock);
[   63.695861]                                lock(&rq->__lock);
[   63.695864]   lock(&lock->wait_lock);
[   63.695868]
[   63.695868]  *** DEADLOCK ***

Fix it by using raw_spin_trylock_irqsave() instead.

Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Signed-off-by: Waiman Long <longman@...hat.com>
---
 kernel/locking/rtmutex.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index ebebd0eec7f6..a32bc2bb5d5e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1381,10 +1381,13 @@ static int __sched rt_mutex_slowtrylock(struct rt_mutex_base *lock)
 		return 0;
 
 	/*
-	 * The mutex has currently no owner. Lock the wait lock and try to
-	 * acquire the lock. We use irqsave here to support early boot calls.
+	 * The mutex has currently no owner. Try to lock the wait lock first.
+	 * If successful, try to acquire the lock. We use irqsave here to
+	 * support early boot calls. Trylock is used all the way to avoid
+	 * circular lock dependency.
 	 */
-	raw_spin_lock_irqsave(&lock->wait_lock, flags);
+	if (!raw_spin_trylock_irqsave(&lock->wait_lock, flags))
+		return 0;
 
 	ret = __rt_mutex_slowtrylock(lock);
 
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ