[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20121120071908.GA6078@gmail.com>
Date: Tue, 20 Nov 2012 15:19:08 +0800
From: Zheng Liu <gnehzuil.liu@...il.com>
To: linux-ext4@...r.kernel.org
Subject: [RFC] extent status tree (step2)
Hi all,
Extent status tree has been applied into linux-next. So I have begun to try to
implement the second step of extent status tree [1]. In this step the following
improvements will be added:
- track all extent status for an inode
- improve delay allocation space reservation
- reduce the race contention of i_data_sem w/ delalloc
- big extent tree (accelerate looking up in extent tree)
* Track all extent status for an inode
Now extent status tree only records the status of delay extents. In this step,
it will be improved to track all extent status for an inode. The extent status
includes DELAY, WRITTEN, UNWRITTEN. When an application opens a file, there
will be an empty extent status tree. While calling get_block_t function, the
extent status will be inserted into this tree. So after some time this tree can
track most of extent entries.
* Improve delay allocation space reservation
Currently we will meet a warning in some specific pressure test w/ bigalloc and
delalloc. The reason is that we need to reserve some spaces for delay
allocation. As bigalloc is enabled this work is complicated. So we can use
extent status tree to track how much space we need to reserve.
* Reduce the race contention of i_data_sem w/ delalloc
When delalloc is enabled, filesystem will accumulate more blocks that are
waiting to be written out. That brings us more continuous file layout and
higher throughput. In a specific case, however, it causes a huge latency for
application. When an app does some append writes, it only needs to wait just
a moment if flusher is sleep and doesn't write any dirty pages out. But when
flusher tries to write these dirty pages, i_data_sem will be taken for a long
time with delalloc because filesystem needs to allocate lots of blocks for these
pages. At the same time, if the app goes on doing a append write, filesystem
will try to take i_data_sem too because it needs to determine whether or not
some blocks has been allocated. So the app must need to wait a long time to
finish this write. It is unacceptable for some applications that are
latency-sensitive.
In this step, we can modify get_block_t function to look up extent status tree.
When filesystem needs to find a block mapping, it will look up extent status
tree firstly. We only needs to take a rwlock and can avoid waiting for a long
time.
* Big extent tree
This year at ext4 developer workshop, Ted and other folks discussed about big
extent cache [2]. The idea is that multiple extent entries are collapsed into
a single in memory. It looks like a cache for extent tree, and can reduce the
cost of memory and accelerate looking up an extent entry. It seems that extent
status tree also can do this thing.
Ted, If you have some updates for big extent cache or I misunderstand something,
please let me know. Thanks!
Any comments or feedbacks are appreciated.
Thanks!
- Zheng
---
1. http://pl.digipedia.org/usenet/thread/11916/30410/
2. http://www.spinics.net/lists/linux-ext4/msg31742.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists