linux-kernel - [GIT PULL] fs: MNT_WRITE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220422105231.197721-1-brauner@kernel.org>
Date:   Fri, 22 Apr 2022 12:52:31 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [GIT PULL] fs: MNT_WRITE_HOLD fix

Hey Linus,

/* Summary */
The recent cleanup in e257039f0fc7 ("mount_setattr(): clean the control flow
and calling conventions") switched the mount attribute codepaths from do-while
to for loops as they are more idiomatic when walking mounts.

However, we did originally choose do-while constructs because if we request a
mount or mount tree to be made read-only we need to hold writers in the
following way: The mount attribute code will grab lock_mount_hash() and then
call mnt_hold_writers() which will _unconditionally_ set MNT_WRITE_HOLD on the
mount.

Any callers that need write access have to call mnt_want_write(). They will
immediately see that MNT_WRITE_HOLD is set on the mount and the caller will
then either spin (on non-preempt-rt) or wait on lock_mount_hash() (on
preempt-rt).

The fact that MNT_WRITE_HOLD is set unconditionally means that once
mnt_hold_writers() returns we need to _always_ pair it with
mnt_unhold_writers() in both the failure and success paths.

The do-while constructs did take care of this. But Al's change to a for loop in
the failure path stops on the first mount we failed to change mount attributes
_without_ going into the loop to call mnt_unhold_writers().

This in turn means that once we failed to make a mount read-only via
mount_setattr() - i.e. there are already writers on that mount - we will block
any writers indefinitely. Fix this by ensuring that the for loop always unsets
MNT_WRITE_HOLD including the first mount we failed to change to read-only. Also
sprinkle a few comments into the cleanup code to remind people about what is
happening including myself. After all, I didn't catch it during review.

This is only relevant on mainline and was reported by syzbot. Details about the
syzbot reports are all in the commit message.

/* Testing */
All patches are based on v5.18-rc3 and have been sitting in linux-next. No
build failures or warnings were observed. Syzbot was unable to reproduce the
issue with this patch applied.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with current
mainline.

The following changes since commit b2d229d4ddb17db541098b83524d901257e93845:

  Linux 5.18-rc3 (2022-04-17 13:57:31 -0700)

are available in the Git repository at:

  git@...olite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/fs.fixes.v5.18-rc4

for you to fetch changes up to 0014edaedfd804dbf35b009808789325ca615716:

  fs: unset MNT_WRITE_HOLD on failure (2022-04-21 17:57:37 +0200)

Please consider pulling these changes from the signed fs.fixes.v5.18-rc4 tag.

Thanks!
Christian

----------------------------------------------------------------
fs.fixes.v5.18-rc4

----------------------------------------------------------------
Christian Brauner (1):
      fs: unset MNT_WRITE_HOLD on failure

 fs/namespace.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)