linux-kernel - Re: [PATCH] Freeze bdevs when freezing processes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200610300029.25555.rjw@sisk.pl>
Date:	Mon, 30 Oct 2006 00:29:24 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Pavel Machek <pavel@....cz>
Cc:	David Chinner <dgc@....com>,
	Nigel Cunningham <ncunningham@...uxmail.org>,
	Andrew Morton <akpm@...l.org>,
	LKML <linux-kernel@...r.kernel.org>, xfs@....sgi.com,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] Freeze bdevs when freezing processes.

Hi,

On Sunday, 29 October 2006 18:35, Pavel Machek wrote:
> Hi!
> 
> > > > > As you have them at the moment, the threads seem to be freezing fine.
> > > > > The issue I've seen in the past related not to threads but to timer
> > > > > based activity. Admittedly it was 2.6.14 when I last looked at it, but
> > > > > there used to be a possibility for XFS to submit I/O from a timer when
> > > > > the threads are frozen but the bdev isn't frozen. Has that changed?
> > > > 
> > > > I didn't think we've ever done that - periodic or delayed operations
> > > > are passed off to the kernel threads to execute. A stack trace
> > > > (if you still have it) would be really help here.
> > > > 
> > > > Hmmm - we have a couple of per-cpu work queues as well that are
> > > > used on I/O completion and that can, in some circumstances,
> > > > trigger new transactions. If we are only flush metadata, then
> > > > I don't think that any more I/o will be issued, but I could be
> > > > wrong (maze of twisty passages).
> > > 
> > > Well, I think this exactly is the problem, because worker_threads run with
> > > PF_NOFREEZE set (as I've just said in another message).
> > 
> > Ok, so freezing the filesystem is the only way you can prevent
> > this as the workqueues are flushed as part of quiescing the filesystem.
> 
> Well, alternative is to teach XFS to sense that we are being frozen
> and stop disk writes in such case.
> 
> OTOH freeze_bdevs is perhaps not that bad solution... 

Okay, appended is a patch that implements the freezing of bdevs in a slightly
different way than the Nigel's patch did it.

As Christoph suggested, I have put freeze_filesystems() and thaw_filesystems()
into fs/buffer.c and indroduced the MS_FROZEN flag to mark frozen
filesystems.

It seems to work fine, except I get the following trace from lockdep during
the suspend on a regular basis (not 100% reproducible, though):

Stopping tasks...
=============================================
[ INFO: possible recursive locking detected ]
2.6.19-rc2-mm2 #15
---------------------------------------------
s2disk/5564 is trying to acquire lock:
 (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10

but task is already holding lock:
 (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10

other info that might help us debug this:
3 locks held by s2disk/5564:
 #0:  (&bdev->bd_mount_mutex){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10
 #1:  (&type->s_umount_key#16){----}, at: [<ffffffff80291647>] get_super+0x67/0xc0
 #2:  (&journal->j_barrier){--..}, at: [<ffffffff80475e79>] mutex_lock+0x9/0x10

stack backtrace:

Call Trace:
 [<ffffffff8020af79>] dump_trace+0xb9/0x430
 [<ffffffff8020b333>] show_trace+0x43/0x60
 [<ffffffff8020b635>] dump_stack+0x15/0x20
 [<ffffffff8024a1d1>] __lock_acquire+0x881/0xc60
 [<ffffffff8024a94d>] lock_acquire+0x8d/0xc0
 [<ffffffff80475cd4>] __mutex_lock_slowpath+0xd4/0x270
 [<ffffffff80475e79>] mutex_lock+0x9/0x10
 [<ffffffff802b2bb6>] freeze_bdev+0x16/0x80
 [<ffffffff802b3105>] freeze_filesystems+0x55/0x80
 [<ffffffff80255942>] freeze_processes+0x1e2/0x360
 [<ffffffff802592a3>] snapshot_ioctl+0x163/0x610
 [<ffffffff8029cf0b>] do_ioctl+0x6b/0xa0
 [<ffffffff8029d1eb>] vfs_ioctl+0x2ab/0x2d0
 [<ffffffff8029d27a>] sys_ioctl+0x6a/0xa0
 [<ffffffff80209c2e>] system_call+0x7e/0x83
 [<00002afb13a4d8a9>]

done.
Shrinking memory... done (19126 pages freed)

Greetings,
Rafael


Signed-off-by: Rafael J. Wysocki <rjw@...k.pl>
---
 fs/buffer.c                 |   38 +++++++++++++++++++++++++++++++++
 include/linux/buffer_head.h |    2 +
 include/linux/fs.h          |    1 
 kernel/power/process.c      |   50 +++++++++++++++++++++++++++++---------------
 4 files changed, 74 insertions(+), 17 deletions(-)

Index: linux-2.6.19-rc2-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/kernel/power/process.c
+++ linux-2.6.19-rc2-mm2/kernel/power/process.c
@@ -14,6 +14,7 @@
 #include <linux/module.h>
 #include <linux/syscalls.h>
 #include <linux/freezer.h>
+#include <linux/buffer_head.h>
 
 /* 
  * Timeout for stopping processes
@@ -119,7 +120,7 @@ int freeze_processes(void)
 		read_unlock(&tasklist_lock);
 		todo += nr_user;
 		if (!user_frozen && !nr_user) {
-			sys_sync();
+			freeze_filesystems();
 			start_time = jiffies;
 		}
 		user_frozen = !nr_user;
@@ -156,28 +157,43 @@ int freeze_processes(void)
 void thaw_some_processes(int all)
 {
 	struct task_struct *g, *p;
-	int pass = 0; /* Pass 0 = Kernel space, 1 = Userspace */
 
 	printk("Restarting tasks... ");
 	read_lock(&tasklist_lock);
-	do {
-		do_each_thread(g, p) {
-			/*
-			 * is_user = 0 if kernel thread or borrowed mm,
-			 * 1 otherwise.
-			 */
-			int is_user = !!(p->mm && !(p->flags & PF_BORROWED_MM));
-			if (!freezeable(p) || (is_user != pass))
-				continue;
-			if (!thaw_process(p))
-				printk(KERN_INFO
-					"Strange, %s not stopped\n", p->comm);
-		} while_each_thread(g, p);
 
-		pass++;
-	} while (pass < 2 && all);
+	do_each_thread(g, p) {
+		if (!freezeable(p))
+			continue;
+
+		/* Don't thaw userland processes, for now */
+		if (p->mm && !(p->flags & PF_BORROWED_MM))
+			continue;
+
+		if (!thaw_process(p))
+			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
+	} while_each_thread(g, p);
+
+	read_unlock(&tasklist_lock);
+	if (!all)
+		goto Exit;
+
+	thaw_filesystems();
+	read_lock(&tasklist_lock);
+
+	do_each_thread(g, p) {
+		if (!freezeable(p))
+			continue;
+
+		/* Kernel threads should have been thawed already */
+		if (!p->mm || (p->flags & PF_BORROWED_MM))
+			continue;
+
+		if (!thaw_process(p))
+			printk(KERN_INFO " Strange, %s not stopped\n", p->comm );
+	} while_each_thread(g, p);
 
 	read_unlock(&tasklist_lock);
+Exit:
 	schedule();
 	printk("done.\n");
 }
Index: linux-2.6.19-rc2-mm2/include/linux/buffer_head.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/buffer_head.h
+++ linux-2.6.19-rc2-mm2/include/linux/buffer_head.h
@@ -170,6 +170,8 @@ wait_queue_head_t *bh_waitq_head(struct 
 int fsync_bdev(struct block_device *);
 struct super_block *freeze_bdev(struct block_device *);
 void thaw_bdev(struct block_device *, struct super_block *);
+void freeze_filesystems(void);
+void thaw_filesystems(void);
 int fsync_super(struct super_block *);
 int fsync_no_super(struct block_device *);
 struct buffer_head *__find_get_block(struct block_device *, sector_t, int);
Index: linux-2.6.19-rc2-mm2/include/linux/fs.h
===================================================================
--- linux-2.6.19-rc2-mm2.orig/include/linux/fs.h
+++ linux-2.6.19-rc2-mm2/include/linux/fs.h
@@ -120,6 +120,7 @@ extern int dir_notify_enable;
 #define MS_PRIVATE	(1<<18)	/* change to private */
 #define MS_SLAVE	(1<<19)	/* change to slave */
 #define MS_SHARED	(1<<20)	/* change to shared */
+#define MS_FROZEN	(1<<21)	/* Frozen by freeze_filesystems() */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
 
Index: linux-2.6.19-rc2-mm2/fs/buffer.c
===================================================================
--- linux-2.6.19-rc2-mm2.orig/fs/buffer.c
+++ linux-2.6.19-rc2-mm2/fs/buffer.c
@@ -244,6 +244,44 @@ void thaw_bdev(struct block_device *bdev
 }
 EXPORT_SYMBOL(thaw_bdev);
 
+/**
+ * freeze_filesystems - lock all filesystems and force them into a consistent
+ * state
+ */
+void freeze_filesystems(void)
+{
+	struct super_block *sb;
+
+	/*
+	 * Freeze in reverse order so filesystems dependant upon others are
+	 * frozen in the right order (eg. loopback on ext3).
+	 */
+	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
+		if (!sb->s_root || !sb->s_bdev ||
+		    (sb->s_frozen == SB_FREEZE_TRANS) ||
+		    (sb->s_flags & MS_RDONLY) ||
+		    (sb->s_flags & MS_FROZEN))
+			continue;
+
+		freeze_bdev(sb->s_bdev);
+		sb->s_flags |= MS_FROZEN;
+	}
+}
+
+/**
+ * thaw_filesystems - unlock all filesystems
+ */
+void thaw_filesystems(void)
+{
+	struct super_block *sb;
+
+	list_for_each_entry(sb, &super_blocks, s_list)
+		if (sb->s_flags & MS_FROZEN) {
+			sb->s_flags &= ~MS_FROZEN;
+			thaw_bdev(sb->s_bdev, sb);
+		}
+}
+
 /*
  * Various filesystems appear to want __find_get_block to be non-blocking.
  * But it's the page lock which protects the buffers.  To get around this,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/