[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090929161339.GB11573@duck.suse.cz>
Date: Tue, 29 Sep 2009 18:13:39 +0200
From: Jan Kara <jack@...e.cz>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Jens Axboe <jens.axboe@...cle.com>,
Chris Mason <chris.mason@...cle.com>,
linux-kernel@...r.kernel.org, jack@...e.cz, npiggin@...e.de
Subject: Re: [PATCH] bdi_sync_writeback should WB_SYNC_NONE first
On Sun 27-09-09 10:10:35, Andrew Morton wrote:
> On Sun, 27 Sep 2009 18:55:14 +0200 Jens Axboe <jens.axboe@...cle.com> wrote:
>
> > > I wasn't referring to this patch actually. The code as it stands in
> > > Linus's tree right now attempts to write back up to 2^63 pages...
> >
> > I agree, it could make the fs sync take a looong time. This is not a new
> > issue, though.
>
> It _should_ be a new issue. The old code would estimate the number of
> dirty pages up-front and would then add a +50% fudge factor, so if we
> started the sync with 1GB dirty memory, we write back a max of 1.5GB.
>
> However that might have got broken.
>
> void sync_inodes_sb(struct super_block *sb, int wait)
> {
> struct writeback_control wbc = {
> .sync_mode = wait ? WB_SYNC_ALL : WB_SYNC_NONE,
> .range_start = 0,
> .range_end = LLONG_MAX,
> };
>
> if (!wait) {
> unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
> unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
>
> wbc.nr_to_write = nr_dirty + nr_unstable +
> (inodes_stat.nr_inodes - inodes_stat.nr_unused);
> } else
> wbc.nr_to_write = LONG_MAX; /* doesn't actually matter */
>
> sync_sb_inodes(sb, &wbc);
> }
>
> a) the +50% isn't there in 2.6.31
>
> b) the wait=true case appears to be vulnerable to livelock in 2.6.31.
>
> whodidthat
>
> 38f21977663126fef53f5585e7f1653d8ebe55c4 did that back in January.
Yes, I even remember the (long) discussion:
http://lkml.org/lkml/2008/9/22/361
The problem with using .nr_to_write in WB_SYNC_ALL mode is that
write_cache_pages() can return before it actually wrote the pages we
wanted to write (because someone dirtied other pages in the mean time).
So Nick rather removed the .nr_to_write thing to guarantee proper data
integrity semantics. I'm not sure what is the status of the fixes of
livelock he proposed (added him to CC).
Maybe we could at least fix some cases of possible livelock by setting
.range_end to current i_size when calling writepages(). That would solve a
common case when somebody is extending the file. For the case when someone
writes to a huge hole, we would at least provide a finite upper bound,
although practically it might be just the same as without .range_end.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists