[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r5pfw6ew.fsf@openvz.org>
Date: Mon, 25 Jan 2010 00:15:51 +0300
From: Dmitry Monakhov <dmonakhov@...nvz.org>
To: Al Viro <viro@...IV.linux.org.uk>
Cc: linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
Al Viro <viro@...IV.linux.org.uk> writes:
> On Sun, Jan 24, 2010 at 02:41:15PM +0300, Dmitry Monakhov wrote:
>> Currently on rw=>ro remount we have following race
>> | mount /mnt -oremount,ro | write-task |
>> |-------------------------+------------|
>> | | open(RDWR) |
>> | shrink_dcache_sb(sb); | |
>> | sync_filesystem(sb); | |
>> | | write() |
>> | | close() |
>> | fs_may_remount_ro(sb) | |
>> | sb->s_flags = new_flags | |
>> Later writeback or sync() will result in error due to MS_RDONLY flag
>> In case of ext4 this result in jbd2_start failure on writeback
>> ext4_da_writepages: jbd2_start: 1024 pages, ino 1431; err -30
>> In fact all others are affected by this error but it is not visible
>> because the skip s_flags check on writeback. For example ext3 check
>> (s_flags & MS_RDONLY) only if page has no buffers during journal start.
>>
>> In order to prevent the race we have to block new writers before
>> fs_may_remount_ro() and sync_filesystem(). Let's introduce new
>> sb->s_flags MS_RO_REMOUNT flag for this purpose. But suddenly we have
>> no available space in MS_XXX bits, let's share this bit with MS_REMOUNT.
>> This is possible because MS_REMOUNT used only for passing arguments
>> from flags to sys_mount() and never used in sb->s_flags.
>
> It's not a solution. You get an _attempted_ remount ro making writes
> fail, even if it's going to be unsuccessful. No go...
We have two options for new writers:
1) Fail it via -EROFS
Yes, remount may fail, but it is really unlikely.
2) Defer(block) new writers on until we complete or fail remount
for example like follows. Do you like second solution ?
diff --git a/fs/namespace.c b/fs/namespace.c
index c768f73..daf3c5a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -196,6 +196,15 @@ int __mnt_is_readonly(struct vfsmount *mnt)
return 1;
if (mnt->mnt_sb->s_flags & MS_RDONLY)
return 1;
+ if (mnt->mnt_sb->s_flags & MS_RO_REMOUNT) {
+ int ret = 0;
+ /* Serialize against remount */
+ down_read(&mnt->mnt_sb->s_umount);
+ if (mnt->mnt_sb->s_flags & MS_RDONLY)
+ ret = 1;
+ up_read(&mnt->mnt_sb->s_umount);
+ return ret;
+ }
return 0;
}
EXPORT_SYMBOL_GPL(__mnt_is_readonly);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists