[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110215180435.GH4255@thunk.org>
Date: Tue, 15 Feb 2011 13:04:35 -0500
From: Ted Ts'o <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Masayoshi MIZUMA <m.mizuma@...fujitsu.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
On Tue, Feb 15, 2011 at 06:29:54PM +0100, Jan Kara wrote:
> Sadly this does not quite work because even down_read(&sb->s_umount)
> in thaw_super() can block if there is another process that tries to acquire
> s_umount for writing - a situation like:
> TASK 1 (e.g. flusher) TASK 2 (e.g. remount) TASK 3 (unfreeze)
> down_read(&sb->s_umount)
> block on s_frozen
> down_write(&sb->s_umount)
> -blocked
> down_read(&sb->s_umount)
> -blocked
> behind the write access...
OK, sorry for being dense, but why does this cause a deadlock? What
are you imaging TASK 3 doing that would impede the flusher from
eventually resuming? Or how would TASK 3 prevent userspace from
completing whatever it needs to do (say, a device mapper ioctl)?
freeze_fs has always been inherently dangerous if the userspace does
not know what it's doing. If it freezes the root file system, and
then while the file system is frozen, userspace attempts to modify
/etc/mtab, it's going to lose. I've in the past argued for some kind
of safety timeout that prevents the system from wedging, but the
argument I've gotten back is (a) it's too complex, and (b) userspace
programmers aren't that stupid, and (c) it could cause the filesystem
to unfreeze when userspace wasn't expecting it. Oh, and (d) if the
system wedges up due to userspace being stupid, it's acceptable.
Obviously, if the kernel does something to itself that causes a
deadlock, we need to fix it, but userspace doing something stupid has
been explicitly ruled out of scope, at least in previous
discussions...
> And in particular ext4 has another deadlock of this kind because it does
> IO from ext4_remount() e.g. when doing online resize (I know it's a bit
> artifical but still ;).
OK, I'm being dense again. How does remount and online resize relate
with each other? and it's not I/O in general which is a problem, it's
writeback activity which causes a problem because it takes a read lock
on s_umount, right?
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists