linux-kernel - Re: [PATCH] PM / Freezer: Freeze filesystems while freezing processes (v2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110807001446.GI3162@dastard>
Date:	Sun, 7 Aug 2011 10:14:46 +1000
From:	Dave Chinner <david@...morbit.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Linux PM mailing list <linux-pm@...ts.linux-foundation.org>,
	Pavel Machek <pavel@....cz>,
	Nigel Cunningham <nigel@...onice.net>,
	Christoph Hellwig <hch@...radead.org>,
	Christoph <cr2005@...lub.de>, xfs@....sgi.com,
	LKML <linux-kernel@...r.kernel.org>, linux-ext4@...r.kernel.org,
	Theodore Ts'o <tytso@....edu>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] PM / Freezer: Freeze filesystems while freezing
 processes (v2)

On Sat, Aug 06, 2011 at 11:17:18PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rjw@...k.pl>
> 
> Freeze all filesystems during the freezing of tasks by calling
> freeze_bdev() for each of them and thaw them during the thawing
> of tasks with the help of thaw_bdev().
> 
> This is needed by hibernation, because some filesystems (e.g. XFS)
> deadlock with the preallocation of memory used by it if the memory
> pressure caused by it is too heavy.
> 
> The additional benefit of this change is that, if something goes
> wrong after filesystems have been frozen, they will stay in a
> consistent state and journal replays won't be necessary (e.g. after
> a failing suspend or resume).  In particular, this should help to
> solve a long-standing issue that in some cases during resume from
> hibernation the boot loader causes the journal to be replied for the
> filesystem containing the kernel image and initrd causing it to
> become inconsistent with the information stored in the hibernation
> image.
> 
> This change is based on earlier work by Nigel Cunningham.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@...k.pl>
> ---
> 
> OK, so nobody except for Pavel appears to have any comments, so I assume
> that everyone except for Pavel is fine with the approach, interestingly enough.
> 
> I've removed the MS_FROZEN Pavel complained about from freeze_filesystems()
> and added comments explaining why lockdep_off/on() are used.
> 
> Thanks,
> Rafael
> 
> ---
>  fs/block_dev.c         |   56 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/fs.h     |    6 +++++
>  kernel/power/process.c |    7 +++++-
>  3 files changed, 68 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6/include/linux/fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/fs.h
> +++ linux-2.6/include/linux/fs.h
> @@ -211,6 +211,7 @@ struct inodes_stat_t {
>  #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
>  #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
>  #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
> +#define MS_FROZEN	(1<<25) /* bdev has been frozen */
>  #define MS_NOSEC	(1<<28)
>  #define MS_BORN		(1<<29)
>  #define MS_ACTIVE	(1<<30)
> @@ -2047,6 +2048,8 @@ extern struct super_block *freeze_bdev(s
>  extern void emergency_thaw_all(void);
>  extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
>  extern int fsync_bdev(struct block_device *);
> +extern void freeze_filesystems(void);
> +extern void thaw_filesystems(void);
>  #else
>  static inline void bd_forget(struct inode *inode) {}
>  static inline int sync_blockdev(struct block_device *bdev) { return 0; }
> @@ -2061,6 +2064,9 @@ static inline int thaw_bdev(struct block
>  {
>  	return 0;
>  }
> +
> +static inline void freeze_filesystems(void) {}
> +static inline void thaw_filesystems(void) {}
>  #endif
>  extern int sync_filesystem(struct super_block *);
>  extern const struct file_operations def_blk_fops;
> Index: linux-2.6/fs/block_dev.c
> ===================================================================
> --- linux-2.6.orig/fs/block_dev.c
> +++ linux-2.6/fs/block_dev.c
> @@ -314,6 +314,62 @@ out:
>  }
>  EXPORT_SYMBOL(thaw_bdev);
>  
> +/**
> + * freeze_filesystems - Force all filesystems into a consistent state.
> + */
> +void freeze_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary, because some filesystems (e.g. ext3) lock
> +	 * mutexes in their .freeze_fs() callbacks and leave them locked for
> +	 * their .unfreeze_fs() callbacks to unlock.  This is done under
> +	 * bdev->bd_fsfreeze_mutex, which is then released, but it makes
> +	 * lockdep think something may be wrong when freeze_bdev() attempts
> +	 * to acquire bdev->bd_fsfreeze_mutex for the next filesystem.
> +	 */
> +	lockdep_off();

I thought those problems were fixed. If they aren't, then they most
certainly need to be because holding mutexes over system calls is a
bug.

Well, well:

[252182.603134] ================================================
[252182.604832] [ BUG: lock held when returning to user space! ]
[252182.606086] ------------------------------------------------
[252182.607400] xfs_io/4917 is leaving the kernel with locks still held!
[252182.608905] 1 lock held by xfs_io/4917:
[252182.609739]  #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff812a2aaf>] journal_lock_updates+0xef/0x100

<sigh>

Looks like the problem was fixed for ext4, but not ext3.  Please
report this to the ext3/4 list and get it fixed, don't work around
it here.

> +	/*
> +	 * Freeze in reverse order so filesystems depending on others are
> +	 * frozen in the right order (eg. loopback on ext3).
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (!sb->s_root || !sb->s_bdev ||
> +		    (sb->s_frozen == SB_FREEZE_TRANS) ||
> +		    (sb->s_flags & MS_RDONLY))
> +			continue;
> +
> +		freeze_bdev(sb->s_bdev);
> +		sb->s_flags |= MS_FROZEN;
> +	}

AFAIK, that won't work for btrfs - you have to call freeze_super()
directly for btrfs because it has a special relationship with
sb->s_bdev. And besides, all freeze_bdev does is get an active
reference on the superblock and call freeze_super().

Also, that's traversing the list of superblock with locking and
dereferencing the superblock without properly checking that the
superblock is not being torn down. You should probably use
iterate_supers (or at least copy the code), with a function that
drops the s_umount read lock befor calling freeze_super() and then
picks it back up afterwards.

> +
> +	lockdep_on();
> +}
> +
> +/**
> + * thaw_filesystems - Make all filesystems active again.
> + */
> +void thaw_filesystems(void)
> +{
> +	struct super_block *sb;
> +
> +	/*
> +	 * This is necessary for the same reason as in freeze_filesystems()
> +	 * above.
> +	 */
> +	lockdep_off();
> +
> +	list_for_each_entry(sb, &super_blocks, s_list)
> +		if (sb->s_flags & MS_FROZEN) {
> +			sb->s_flags &= ~MS_FROZEN;
> +			thaw_bdev(sb->s_bdev, sb);
> +		}

And once again, iterate_supers() is what you want here. And you
should only call thaw_bdev() as it needs to do checks other than
checking MS_FROZEN e.g. the above will unfreeze filesystems that
were already frozen at the time a suspend occurs, and that could
lead to corruption depending on why the filesystem was frozen...

Also, you still need to check for a valid sb->s_bdev here, otherwise
<splat>.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/