[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110915162212.GC16195@quack.suse.cz>
Date: Thu, 15 Sep 2011 18:22:12 +0200
From: Jan Kara <jack@...e.cz>
To: Valerie Aurora <val@...consulting.com>
Cc: Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
Dave Chinner <david@...morbit.com>,
Masayoshi MIZUMA <m.mizuma@...fujitsu.com>,
Greg Freemyer <greg.freemyer@...il.com>,
linux-ext4@...r.kernel.org, Eric Sandeen <sandeen@...hat.com>
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock
On Wed 14-09-11 16:53:38, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <jack@...e.cz> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> long write_chunk;
> >> long wrote = 0; /* count both pages and inodes */
> >>
> >> + if (vfs_is_frozen(sb))
> >> + return 0;
> >> +
> > Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
>
> Sorry, I don't quite understand what the goal is here? I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
My worry is, that when we just bail out from writeback_sb_inodes(), we
leave dirty inodes on b_io list. That is a unique behavior in writeback
code since all other places either put inode to b_more_io list for a quick
retry or redirty the inode to retry it later.
Emptyness of b_io list is used for example to detect whether we should
queue more inodes to b_io list and also busylooping logic kind of doesn't
count with inodes staying at b_io list. So although I don't see an
immediate problem with your solution it does add a new special case.
After some more thought I think the cleanest would be to just call
redirty_tail() if an inode is on frozen superblock in the beginning of the
loop in writeback_sb_inodes(). It won't be as efficient but much more in
line how the rest of the writeback code works.
> >> while (!list_empty(&wb->b_io)) {
> >> struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >> * writeback_inodes_sb_if_idle - start writeback if none underway
> >> * @sb: the superblock
> >> *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock. Returns 1 if writeback
> >> + * was started, 0 if not.
> >> */
> >> int writeback_inodes_sb_if_idle(struct super_block *sb)
> >> {
> >> if (!writeback_in_progress(sb->s_bdi)) {
> >> - down_read(&sb->s_umount);
> >> - writeback_inodes_sb(sb);
> >> - up_read(&sb->s_umount);
> >> - return 1;
> >> - } else
> >> - return 0;
> >> + if (down_read_trylock(&sb->s_umount)) {
> > What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
>
> The trylock is an optimization Dave Chinner suggested. The first
> version I wrote acquired the lock and then checked vfs_is_frozen().
>
> This function and the similar following one each have one caller,
> btrfs in one case and ext4 in the other, and they are both trying to
> get more writes to go out to disk in the hope of freeing up disk
> space.
Ah right. Maybe a short comment that it's just an optimization for
opportunictic writeout would be nice.
> >> + writeback_inodes_sb(sb);
> >> + up_read(&sb->s_umount);
> >> + return 1;
> >> + }
> >> + }
> >> + return 0;
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists