[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090922113055.GA13887@duck.suse.cz>
Date: Tue, 22 Sep 2009 13:30:55 +0200
From: Jan Kara <jack@...e.cz>
To: Wu Fengguang <fengguang.wu@...el.com>
Cc: Chris Mason <chris.mason@...cle.com>, Theodore Tso <tytso@....edu>,
Jens Axboe <jens.axboe@...cle.com>,
Christoph Hellwig <hch@...radead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"jack@...e.cz" <jack@...e.cz>
Subject: Re: [PATCH 0/7] Per-bdi writeback flusher threads v20
On Tue 22-09-09 18:13:35, Wu Fengguang wrote:
> Yes a more general solution would help. I'd like to propose one which
> works in the other way round. In brief,
> (1) the VFS give a large enough per-file writeback quota to btrfs;
> (2) btrfs tells VFS "here is a (seek) boundary, stop voluntarily",
> before exhausting the quota and be force stopped.
>
> There will be two limits (the second one is new):
>
> - total nr to write in one wb_writeback invocation
> - _max_ nr to write per file (before switching to sync the next inode)
>
> The per-invocation limit is useful for balance_dirty_pages().
> The per-file number can be accumulated across successive wb_writeback
> invocations and thus can be much larger (eg. 128MB) than the legacy
> per-invocation number.
Actually, it doesn't make much sence to have a per-file limit in number
of pages. I've been playing with an idea that we could have a per-file
*time* quota. That would have an advantage that if a file generates random
IO, we wouldn't block for longer time on it than when it generates linear
IO.
I imagine that in ->writepage we would substract from given time quota in
wbc the time it takes to write the current page. It would need some context
in wbc so that it is able to tell whether the IO is linear or random to
properly account for some seek penalty but generally it seems to be
doable...
Filesystems implementing ->writepages can then make decision whether they
have enough time quota to seek to next extent and write it out or whether
they should rather yield to other inodes...
> The file system will only see the per-file numbers. The "max" means
> if btrfs find the current page to be the last page in the extent,
> it could indicate this fact to VFS by setting wbc->would_seek=1. The
> VFS will then switch to write the next inode.
>
> The benefit of early voluntarily yield is, it reduced the possibility
> to be force stopped half way in an extent. When next time VFS returns
> to sync this inode, it will again be honored the full 128MB quota,
> which should be enough to cover a big fresh extent.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists