[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080805132133.GA15568@skywalker>
Date: Tue, 5 Aug 2008 18:51:33 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To: Theodore Tso <tytso@....edu>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Problem with delayed allocation
On Tue, Aug 05, 2008 at 12:22:17PM +0530, Aneesh Kumar K.V wrote:
> On Tue, Aug 05, 2008 at 02:44:28AM -0400, Theodore Tso wrote:
> > On Mon, Aug 04, 2008 at 10:05:05PM +0530, Aneesh Kumar K.V wrote:
> > >
> > > This is the complete patch that I have. I haven't fully tested it
> > > (right now waiting for the machine to be free). This should apply
> > > after stable-boundary-undo.patch
> >
> > Umm... the patch doesn't apply right after the stable boundary udo
> > patch.
> >
> > - Ted
>
> I did a fresh git pull and updated the patch. I also accumulated few
> changes after words while testing on ABAT. Attaching both the patches
> below. The patches apply after ext4_journal_credits_fix_for_writepages.patch
> in the patch queue.
I still see the problem with the below changes. Now that i have read
the writeback path more closely I am not sure how it will guarantee
that all dirty pages of the inode are written back to disk before
generic_sync_sb_inodes return.
.....
....
> @@ -2202,10 +2224,7 @@ static int ext4_da_writepages(struct address_space *mapping,
> int ret = 0;
> long to_write;
> loff_t range_start = 0;
> - int blocks_per_page = PAGE_CACHE_SIZE >> inode->i_blkbits;
> - int max_credit_blocks = ext4_journal_max_transaction_buffers(inode);
> - int need_credits_per_page = ext4_writepages_trans_blocks(inode, 1);
> - int max_writeback_pages = (max_credit_blocks / blocks_per_page) / need_credits_per_page;
> + long pages_skipped = 0;
>
> /*
> * No pages to write? This is mainly a kludge to avoid starting
> @@ -2215,11 +2234,6 @@ static int ext4_da_writepages(struct address_space *mapping,
> if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
> return 0;
>
> - if (wbc->nr_to_write > mapping->nrpages)
> - wbc->nr_to_write = mapping->nrpages;
> -
> - to_write = wbc->nr_to_write;
> -
> if (!wbc->range_cyclic) {
> /*
> * If range_cyclic is not set force range_cont
> @@ -2228,26 +2242,21 @@ static int ext4_da_writepages(struct address_space *mapping,
> wbc->range_cont = 1;
> range_start = wbc->range_start;
> }
> + pages_skipped = wbc->pages_skipped;
>
> - while (!ret && to_write) {
> - /*
> - * set the max dirty pages could be write at a time
> - * to fit into the reserved transaction credits
> - */
> - if (wbc->nr_to_write > max_writeback_pages)
> - wbc->nr_to_write = max_writeback_pages;
> +restart_loop:
> + to_write = wbc->nr_to_write;
> + while (!ret && to_write > 0) {
>
....
.....
> * or we requested for a noblocking writeout
> @@ -2288,6 +2304,15 @@ static int ext4_da_writepages(struct address_space *mapping,
> wbc->nr_to_write = to_write;
> }
>
> + if (wbc->range_cont && (pages_skipped != wbc->pages_skipped)) {
> + /* We skipped pages in this loop */
> + wbc->range_start = range_start;
> + wbc->nr_to_write = to_write +
> + wbc->pages_skipped - pages_skipped;
> + wbc->pages_skipped = pages_skipped;
> + goto restart_loop;
> + }
> +
This should not be needed. I was trying to force the pages to writeback.
generic_sync_sb_inodes actually move the inode to s_dirty if the
pages_skipped differ after a writeback. But the confusing part is we
are not looking at s_dirty list again. We move s_dirty and s_more_io to s_io
only once in queue_io
> out_writepages:
> wbc->nr_to_write = to_write;
> if (range_start)
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 8d62200..023e1a8 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1790,6 +1790,13 @@ static void mpage_da_map_blocks(struct mpage_da_data *mpd)
> new.b_state = lbh->b_state;
> new.b_blocknr = 0;
> new.b_size = lbh->b_size;
> +
> + /*
> + * If we didn't accumulate anything
> + * to write simply return
> + */
> + if (!new.b_size)
> + return;
> err = mpd->get_block(mpd->inode, next, &new, 1);
> if (err)
> return;
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 25adfc3..a7db10c 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -517,8 +517,12 @@ void generic_sync_sb_inodes(struct super_block *sb,
> cond_resched();
> spin_lock(&inode_lock);
> if (wbc->nr_to_write <= 0) {
> - wbc->more_io = 1;
> - break;
> + if (wbc->sync_mode == WB_SYNC_ALL) {
> + wbc->nr_to_write = LONG_MAX;
> + } else {
> + wbc->more_io = 1;
> + break;
> + }
> }
> if (!list_empty(&sb->s_more_io))
> wbc->more_io = 1;
This also should not be done. I guess we need to look at core writeback
code more closely.
-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists