linux-kernel - Semantics vs. usage of mutex_is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFqZXNt0Xp1j7+hTrV9XZ936Yz+H8Le0pqazhLr3drO0tEzB2w@mail.gmail.com>
Date:   Mon, 7 Feb 2022 16:15:27 +0100
From:   Ondrej Mosnacek <omosnace@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>
Cc:     Linux kernel mailing list <linux-kernel@...r.kernel.org>,
        SElinux list <selinux@...r.kernel.org>
Subject: Semantics vs. usage of mutex_is_locked()

Hello,

(This is addressed mainly to the kernel/locking/ maintainers.)

In security/selinux/ima.c, we have two functions for which we want to
assert the expected locking status of a mutex. In the first function
we expect the caller to obtain the lock, so we have
`WARN_ON(!mutex_is_locked(&state->policy_mutex));` there. The second
one, on the contrary, takes the lock on its own, so there is an
inverse assert (that the caller hasn't already taken the lock) -
`WARN_ON(mutex_is_locked(&state->policy_mutex));`.

Recently, I got a report that the second WARN_ON() got triggered,
while there was no function in the call chain that could have taken
the lock. Looking into it, I realized that mutex_is_locked() actually
doesn't check what we assumed ("Are we holding the lock?"), but
instead answers the question "Is any task holding the lock?". So in
theory it can happen that the second WARN_ON() gets hit randomly in an
otherwise correct code simply because some other task happens to be
holding the mutex. Similarly, the first assert might not catch all
cases where taking the mutex was forgotten, because another task may
be holding it, making the assert pass.

Grepping the whole tree for mutex_is_locked finds about 300 uses, the
vast majority of which are variations of the
warn-if-mutex-not-locked-by-us pattern. Then there are a handful of
cases where the usage of mutex_is_locked() seems correct and a few
cases of the inverse warn-if-mutex-already-locked-by-us pattern.

It seems like introducing a new helper with the "is the mutex locked
by current task?" semantics would be fairly straightforward, however
fixing all the mutex_is_locked() misuses would be a rather big and
noisy patch(set). That said, would it be okay if I send patches that
introduce a new helper and only fix misuses that can lead to wrong
behavior when the code is correct (e.g. can yield a false positive
WARNING/BUG) and documentation? That should be a reasonably small set
of changes, yet should take care of the most important issues. If
anyone cares enough for the rest, they can always send further
patches.

Also, any opinions on the name of the new helper? Perhaps
mutex_is_held()? Or mutex_is_locked_by_current()?

Thanks,

--
Ondrej Mosnacek
Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.