linux-cve-announce - CVE-2025-37988: fix a couple of races in MNT_TREE_BENEATH handling by do_move

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2025052038-CVE-2025-37988-1fa1@gregkh>
Date: Tue, 20 May 2025 19:07:41 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-cve-announce@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...nel.org>
Subject: CVE-2025-37988: fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()

From: Greg Kroah-Hartman <gregkh@...nel.org>

Description
===========

In the Linux kernel, the following vulnerability has been resolved:

fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount()

Normally do_lock_mount(path, _) is locking a mountpoint pinned by
*path and at the time when matching unlock_mount() unlocks that
location it is still pinned by the same thing.

Unfortunately, for 'beneath' case it's no longer that simple -
the object being locked is not the one *path points to.  It's the
mountpoint of path->mnt.  The thing is, without sufficient locking
->mnt_parent may change under us and none of the locks are held
at that point.  The rules are
	* mount_lock stabilizes m->mnt_parent for any mount m.
	* namespace_sem stabilizes m->mnt_parent, provided that
m is mounted.
	* if either of the above holds and refcount of m is positive,
we are guaranteed the same for refcount of m->mnt_parent.

namespace_sem nests inside inode_lock(), so do_lock_mount() has
to take inode_lock() before grabbing namespace_sem.  It does
recheck that path->mnt is still mounted in the same place after
getting namespace_sem, and it does take care to pin the dentry.
It is needed, since otherwise we might end up with racing mount --move
(or umount) happening while we were getting locks; in that case
dentry would no longer be a mountpoint and could've been evicted
on memory pressure along with its inode - not something you want
when grabbing lock on that inode.

However, pinning a dentry is not enough - the matching mount is
also pinned only by the fact that path->mnt is mounted on top it
and at that point we are not holding any locks whatsoever, so
the same kind of races could end up with all references to
that mount gone just as we are about to enter inode_lock().
If that happens, we are left with filesystem being shut down while
we are holding a dentry reference on it; results are not pretty.

What we need to do is grab both dentry and mount at the same time;
that makes inode_lock() safe *and* avoids the problem with fs getting
shut down under us.  After taking namespace_sem we verify that
path->mnt is still mounted (which stabilizes its ->mnt_parent) and
check that it's still mounted at the same place.  From that point
on to the matching namespace_unlock() we are guaranteed that
mount/dentry pair we'd grabbed are also pinned by being the mountpoint
of path->mnt, so we can quietly drop both the dentry reference (as
the current code does) and mnt one - it's OK to do under namespace_sem,
since we are not dropping the final refs.

That solves the problem on do_lock_mount() side; unlock_mount()
also has one, since dentry is guaranteed to stay pinned only until
the namespace_unlock().  That's easy to fix - just have inode_unlock()
done earlier, while it's still pinned by mp->m_dentry.

The Linux kernel CVE team has assigned CVE-2025-37988 to this issue.


Affected and fixed versions
===========================

	Issue introduced in 6.5 with commit 6ac392815628f317fcfdca1a39df00b9cc4ebc8b and fixed in 6.6.89 with commit 4f435c1f4c48ff84968e2d9159f6fa41f46cf998
	Issue introduced in 6.5 with commit 6ac392815628f317fcfdca1a39df00b9cc4ebc8b and fixed in 6.12.26 with commit a61afd54826ac24c2c93845c4f441dbc344875b1
	Issue introduced in 6.5 with commit 6ac392815628f317fcfdca1a39df00b9cc4ebc8b and fixed in 6.14.5 with commit d4b21e8cd3d7efa2deb9cff534f0133e84f35086
	Issue introduced in 6.5 with commit 6ac392815628f317fcfdca1a39df00b9cc4ebc8b and fixed in 6.15-rc4 with commit 0d039eac6e5950f9d1ecc9e410c2fd1feaeab3b6

Please see https://www.kernel.org for a full list of currently supported
kernel versions by the kernel community.

Unaffected versions might change over time as fixes are backported to
older supported kernel versions.  The official CVE entry at
	https://cve.org/CVERecord/?id=CVE-2025-37988
will be updated if fixes are backported, please check that for the most
up to date information about this issue.


Affected files
==============

The file(s) affected by this issue are:
	fs/namespace.c


Mitigation
==========

The Linux kernel CVE team recommends that you update to the latest
stable kernel version for this, and many other bugfixes.  Individual
changes are never tested alone, but rather are part of a larger kernel
release.  Cherry-picking individual commits is not recommended or
supported by the Linux kernel community at all.  If however, updating to
the latest release is impossible, the individual changes to resolve this
issue can be found at these commits:
	https://git.kernel.org/stable/c/4f435c1f4c48ff84968e2d9159f6fa41f46cf998
	https://git.kernel.org/stable/c/a61afd54826ac24c2c93845c4f441dbc344875b1
	https://git.kernel.org/stable/c/d4b21e8cd3d7efa2deb9cff534f0133e84f35086
	https://git.kernel.org/stable/c/0d039eac6e5950f9d1ecc9e410c2fd1feaeab3b6