[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110215191125.GL17313@quack.suse.cz>
Date: Tue, 15 Feb 2011 20:11:25 +0100
From: Jan Kara <jack@...e.cz>
To: Ted Ts'o <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>,
Masayoshi MIZUMA <m.mizuma@...fujitsu.com>,
Andreas Dilger <adilger.kernel@...ger.ca>,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock
On Tue 15-02-11 13:04:35, Ted Ts'o wrote:
> On Tue, Feb 15, 2011 at 06:29:54PM +0100, Jan Kara wrote:
> > Sadly this does not quite work because even down_read(&sb->s_umount)
> > in thaw_super() can block if there is another process that tries to acquire
> > s_umount for writing - a situation like:
> > TASK 1 (e.g. flusher) TASK 2 (e.g. remount) TASK 3 (unfreeze)
> > down_read(&sb->s_umount)
> > block on s_frozen
> > down_write(&sb->s_umount)
> > -blocked
> > down_read(&sb->s_umount)
> > -blocked
> > behind the write access...
>
> OK, sorry for being dense, but why does this cause a deadlock? What
> are you imaging TASK 3 doing that would impede the flusher from
> eventually resuming? Or how would TASK 3 prevent userspace from
> completing whatever it needs to do (say, a device mapper ioctl)?
I was arguing that using down_read(sb->s_umount) in thaw_super() instead
of down_write() does not solve anything. The deadlock as originally
reported can still happen, you just need another task (TASK 2 in the above
scheme) to block in down_write() before thaw_super() happens.
> freeze_fs has always been inherently dangerous if the userspace does
> not know what it's doing. If it freezes the root file system, and
> then while the file system is frozen, userspace attempts to modify
> /etc/mtab, it's going to lose. I've in the past argued for some kind
> of safety timeout that prevents the system from wedging, but the
> argument I've gotten back is (a) it's too complex, and (b) userspace
> programmers aren't that stupid, and (c) it could cause the filesystem
> to unfreeze when userspace wasn't expecting it. Oh, and (d) if the
> system wedges up due to userspace being stupid, it's acceptable.
>
> Obviously, if the kernel does something to itself that causes a
> deadlock, we need to fix it, but userspace doing something stupid has
> been explicitly ruled out of scope, at least in previous
> discussions...
>
> > And in particular ext4 has another deadlock of this kind because it does
> > IO from ext4_remount() e.g. when doing online resize (I know it's a bit
> > artifical but still ;).
>
> OK, I'm being dense again. How does remount and online resize relate
> with each other? and it's not I/O in general which is a problem, it's
> writeback activity which causes a problem because it takes a read lock
> on s_umount, right?
The problem is to start a transaction while holding s_umount semaphore,
or actually any lock that thaw_super() (including per-filesystem
->unfreeze_fs() callback) needs. For ext4 this seems to be sb->s_lock.
I was actually wrong with the ext4 online resizing using resize option
causing possible deadlocks because do_remount_sb() refuses to do anything
with the superblock while it is frozen... But still if we ever happen to
start a transaction in ext4 while sb->s_lock is held, the deadlock with
freezing code can happen and that's just subtle and ugly IMHO.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists