linux-kernel - Re: [bug report] locking/rtmutex: Return success on deadlock for ww

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <YS81R45p7mYbhmrT@hirez.programming.kicks-ass.net>
Date:   Wed, 1 Sep 2021 10:09:43 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dan Carpenter <dan.carpenter@...cle.com>
Cc:     kernel-janitors@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [bug report] locking/rtmutex: Return success on deadlock for
 ww_mutex waiters

On Tue, Aug 31, 2021 at 11:21:52AM +0300, Dan Carpenter wrote:
> Hello Peter Zijlstra,

Hi Dan :-)

> This is a semi-automatic email about new static checker warnings.
> 
> The patch a055fcc132d4: "locking/rtmutex: Return success on deadlock
> for ww_mutex waiters" from Aug 26, 2021, leads to the following
> Smatch complaint:
> 
>     kernel/locking/rtmutex.c:756 rt_mutex_adjust_prio_chain()
>     error: we previously assumed 'orig_waiter' could be null (see line 644)
> 
> kernel/locking/rtmutex.c
>    643		 */
>    644		if (orig_waiter && !rt_mutex_owner(orig_lock))
>                     ^^^^^^^^^^^
> A lot of this code assumes "orig_waiter" can be NULL.
> 

>    735		/*
>    736		 * [6] check_exit_conditions_2() protected by task->pi_lock and
>    737		 * lock->wait_lock.
>    738		 *
>    739		 * Deadlock detection. If the lock is the same as the original
>    740		 * lock which caused us to walk the lock chain or if the
>    741		 * current lock is owned by the task which initiated the chain
>    742		 * walk, we detected a deadlock.
>    743		 */
>    744		if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
>                     ^^^^^^^^^^^^^^^^^
> This might mean it's a false positive, but Smatch isn't clever enough to
> figure it out.  And I'm stupid too!  Plus lazy...  and ugly.
> 
>    745			ret = -EDEADLK;
>    746	
>    747			/*
>    748			 * When the deadlock is due to ww_mutex; also see above. Don't
>    749			 * report the deadlock and instead let the ww_mutex wound/die
>    750			 * logic pick which of the contending threads gets -EDEADLK.
>    751			 *
>    752			 * NOTE: assumes the cycle only contains a single ww_class; any
>    753			 * other configuration and we fail to report; also, see
>    754			 * lockdep.
>    755			 */
>    756			if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
>                                                              ^^^^^^^^^^^^^^^^^^^
> Unchecked dereference.


This is difficult... and I'm glad you flagged it. The normal de-boost
path is through rt_mutex_adjust_prio() and that has: .orig_lock == NULL
&& .orig_waiter == NULL. And as such it would never trigger the above
case.

However, there is remove_waiter() which is called on rt_mutex_lock()'s
failure paths and that doesn't have .orig_lock == NULL, and as such
*could* conceivably trigger this.

Let me figure out what the right thing to do is.

Thanks!