[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4790570E.80709@google.com>
Date: Thu, 17 Jan 2008 23:36:46 -0800
From: Mike Waychison <mikew@...gle.com>
To: Fengguang Wu <wfg@...l.ustc.edu.cn>
CC: Andrew Morton <akpm@...ux-foundation.org>,
Michael Rubin <mrubin@...gle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data
structure
Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
>> On Wed, 16 Jan 2008 12:55:07 +0800 Fengguang Wu <wfg@...l.ustc.edu.cn> wrote:
>>
>>> On Tue, Jan 15, 2008 at 08:42:36PM -0800, Andrew Morton wrote:
>>>> On Wed, 16 Jan 2008 12:25:53 +0800 Fengguang Wu <wfg@...l.ustc.edu.cn> wrote:
>>>>
>>>>> list_heads are OK if we use them for one and only function.
>>>> Not really. They're inappropriate when you wish to remember your
>>>> position in the list while you dropped the lock (as we must do in
>>>> writeback).
>>>>
>>>> A data structure which permits us to interate across the search key rather
>>>> than across the actual storage locations is more appropriate.
>>> I totally agree with you. What I mean is to first do the split of
>>> functions - into three: ordering, starvation prevention, and blockade
>>> waiting.
>> Does "ordering" here refer to ordering bt time-of-first-dirty?
>
> Ordering by dirtied_when or i_ino, either is OK.
>
>> What is "blockade waiting"?
>
> Some inodes/pages cannot be synced now for some reason and should be
> retried after a while.
>
>>> Then to do better ordering by adopting radix tree(or rbtree
>>> if radix tree is not enough),
>> ordering of what?
>
> Switch from time to location.
>
Given the way LBAs are located on disk and the fact that rotational
latency is a large factor in changing locations of a drive head, any
attempts to do a C-SCAN pass are pretty much useless. Further
complicating this is any volume management that sits between the fs and
the actual storage.
A nice feature to have longer term is to have the write_inodes paths for
background flushing understand storage congestion _through_ any volume
management. This would allow us to back off background flushing on a per
spindle basis (when using drives of course) and avoid write congestion
in both the io scheduler and in the drive's writecaches, which I
believe, but don't have hard evidence, get congested today, knocking the
drive into a fifo fashion in firmware.
A data structure that allows us to keep a dirtied_when values consistent
across back-offs and blocking allows us to further develop the
background writeout paths to get to this point (though exposing this
congestion information will require more work deeper in the stack).
>>> and lastly get rid of the list_heads to
>>> avoid locking. Does it sound like a good path?
>> I'd have thaought that replacing list_heads with another data structure
>> would be a simgle commit.
>
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists