lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Sep 2023 15:26:21 +0200 (CEST)
From:   Mikulas Patocka <mpatocka@...hat.com>
To:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Zdenek Kabelac <zkabelac@...hat.com>
cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        dm-devel@...hat.com
Subject: [PATCH] fix writing to the filesystem after unmount

lvm may suspend any logical volume anytime. If lvm suspend races with
unmount, it may be possible that the kernel writes to the filesystem after
unmount successfully returned. The problem can be demonstrated with this
script:

#!/bin/sh -ex
modprobe brd rd_size=4194304
vgcreate vg /dev/ram0
lvcreate -L 16M -n lv vg
mkfs.ext4 /dev/vg/lv
mount -t ext4 /dev/vg/lv /mnt/test
dmsetup suspend /dev/vg/lv
(sleep 1; dmsetup resume /dev/vg/lv) &
umount /mnt/test
md5sum /dev/vg/lv
md5sum /dev/vg/lv
dmsetup remove_all
rmmod brd

The script unmounts the filesystem and runs md5sum twice, the result is
that these two invocations return different hash.

What happens:
* dmsetup suspend calls freeze_bdev, that goes to freeze_super and it
  increments sb->s_active
* then we unmount the filesystem, we go to cleanup_mnt, cleanup_mnt calls
  deactivate_super, deactivate_super sees that sb->s_active is 2, so it
  decreases it to 1 and does nothing - the umount syscall returns
  successfully
* now we have a mounted filesystem despite the fact that umount returned
* we call md5sum, this waits for the block device being unblocked
* dmsetup resume unblocks the block device and calls thaw_bdev, that calls
  thaw_super and thaw_super_locked
* thaw_super_locked calls deactivate_locked_super, this actually drops the
  refcount and performs the unmount. The unmount races with md5sum. md5sum
  wins the race and it returns the hash of the filesystem before it was
  unmounted
* the second md5sum returns the hash after the filesystem was unmounted

In order to fix this bug, this patch introduces a new function
wait_and_deactivate_super that will wait if the filesystem is frozen and
perform deactivate_locked_super only after that.

Signed-off-by: Mikulas Patocka <mpatocka@...hat.com>
Cc: stable@...r.kernel.org

---
 fs/namespace.c     |    2 +-
 fs/super.c         |   20 ++++++++++++++++++++
 include/linux/fs.h |    1 +
 3 files changed, 22 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/namespace.c
===================================================================
--- linux-2.6.orig/fs/namespace.c	2023-09-06 09:45:54.000000000 +0200
+++ linux-2.6/fs/namespace.c	2023-09-06 09:47:15.000000000 +0200
@@ -1251,7 +1251,7 @@ static void cleanup_mnt(struct mount *mn
 	}
 	fsnotify_vfsmount_delete(&mnt->mnt);
 	dput(mnt->mnt.mnt_root);
-	deactivate_super(mnt->mnt.mnt_sb);
+	wait_and_deactivate_super(mnt->mnt.mnt_sb);
 	mnt_free_id(mnt);
 	call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
 }
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c	2023-09-05 21:09:16.000000000 +0200
+++ linux-2.6/fs/super.c	2023-09-06 09:52:20.000000000 +0200
@@ -36,6 +36,7 @@
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
+#include <linux/delay.h>
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
@@ -365,6 +366,25 @@ void deactivate_super(struct super_block
 EXPORT_SYMBOL(deactivate_super);
 
 /**
+ *	wait_and_deactivate_super	-	wait for unfreeze and drop a reference
+ *	@s: superblock to deactivate
+ *
+ *	Variant of deactivate_super(), except that we wait if the filesystem is
+ *	frozen. This is required on unmount, to make sure that the filesystem is
+ *	really unmounted when this function returns.
+ */
+void wait_and_deactivate_super(struct super_block *s)
+{
+	down_write(&s->s_umount);
+	while (s->s_writers.frozen != SB_UNFROZEN) {
+		up_write(&s->s_umount);
+		msleep(1);
+		down_write(&s->s_umount);
+	}
+	deactivate_locked_super(s);
+}
+
+/**
  *	grab_super - acquire an active reference
  *	@s: reference we are trying to make active
  *
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2023-09-05 21:09:16.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2023-09-06 09:46:56.000000000 +0200
@@ -2262,6 +2262,7 @@ void kill_anon_super(struct super_block
 void kill_litter_super(struct super_block *sb);
 void deactivate_super(struct super_block *sb);
 void deactivate_locked_super(struct super_block *sb);
+void wait_and_deactivate_super(struct super_block *sb);
 int set_anon_super(struct super_block *s, void *data);
 int set_anon_super_fc(struct super_block *s, struct fs_context *fc);
 int get_anon_bdev(dev_t *);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ