[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130108022543.GA3732@gmail.com>
Date: Tue, 8 Jan 2013 10:25:43 +0800
From: Zheng Liu <gnehzuil.liu@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
Zheng Liu <wenqing.lz@...bao.com>
Subject: Re: [RFC][PATCH 3/9 v1] ext4: add physical block and status member
into extent status tree
On Tue, Jan 08, 2013 at 12:27:54PM +1100, Dave Chinner wrote:
> On Sat, Jan 05, 2013 at 10:44:01AM +0800, Zheng Liu wrote:
> > On Wed, Jan 02, 2013 at 12:22:55PM +0100, Jan Kara wrote:
> > > On Tue 01-01-13 13:16:07, Zheng Liu wrote:
> > > > On Mon, Dec 31, 2012 at 10:49:52PM +0100, Jan Kara wrote:
> > > > > On Mon 24-12-12 15:55:36, Zheng Liu wrote:
> > > > > > From: Zheng Liu <wenqing.lz@...bao.com>
> > > > > >
> > > > > > es_pblk is used to record physical block that maps to the disk. es_status is
> > > > > > used to record the status of the extent. Three status are defined, which are
> > > > > > written, unwritten and delayed.
> > > > > So this means one extent is 48 bytes on 64-bit architectures. If I'm a
> > > > > nasty user and create artificially fragmented file (by allocating every
> > > > > second block), extent tree takes 6 MB per GB of file. That's quite a bit
> > > > > and I think you need to provide a way for kernel to reclaim extent
> > > > > structures...
> > > >
> > > > Indeed, when a file has a lot of fragmentations, status tree will occupy
> > > > a number of memory. That is why it will be loaded on-demand. When I make
> > > > it, there are two solutions to load status tree. One is loading
> > > > on-demand, and another is loading complete extent tree in
> > > > ext4_alloc_inode(). Finally I choose the former because it can reduce
> > > > the pressure of memory at most of time. But it has a disadvantage that
> > > > status tree doesn't be fully trusted because it hasn't track a
> > > > completely status of extent tree on disk.
> > > Not reading the whole extent tree in ext4_alloc_inode() is a good start
> > > but it's not the whole solution IMHO. It saves us from unnecessary reading
> > > of extents but still if someone reads the whole filesystem (like
> > > grep -R "foo" /) you will still end up with all extents cached. And that
> > > will make ext4 inodes pretty heavy in memory. Surely inode reclaim will
> > > eventually release these inodes including cached extents but it is usually
> > > more beneficial to cache the inode itself than more extents so allowing us
> > > to strip cached extents without releasing inode itself would be good.
> > >
> > > > I will provide a way to reclaim extent structures from status tree. Now
> > > > I have an idea in my mind that we can reclaim all extent which are
> > > > WRITTEN/UNWRITTEN status because we always need DELAYED extent in
> > > > fiemap, seek_data/hole and bigalloc code. Furthermore, as you said in
> > > > another mail, some unwritten extent which will be converted into
> > > > written also doesn't be reclaimed.
> > > >
> > > > Another question is when do these extents reclaim? Currently when
> > > > clear_inode() is called, the whole status tree will be reclaimed. Maybe
> > > > a switch in sysfs is a optional choice. Any thoughts?
> > > The natural way to handle the shrinking is using 'shrinker' framework. In
> > > this case, we could register a shrinker for shrinking extents. Just having
> > > LRU of extents would increase the size of extent structure by 2 pointers
> > > which is too big I'd think and I'm not yet sure how to choose extents for
> > > reclaim in some other way. I will think about it...
> >
> > Hi Jan,
> >
> > Sorry for the delay. 'shrinker' framework is an option. We can define
> > a callback function to reclaim extents from status tree. When we access
> > an extent in an inode, we will move this inode into the tail of LRU list.
> > But this way has a defect that the spinlock which protects the LRU list
> > has a heavy contention because all inodes need to take this lock. I
> > guess this overhead is unacceptable for us. Any comments?
>
> Measure it first. There are several filesystem global locks still
> in existance at the VFS level. solve the simple problem first, and
> then the hard problem might get solved for you by someone else. e.g:
>
> http://oss.sgi.com/archives/xfs/2012-11/msg00643.html
Thanks for teaching me. :-) I will measure its overhead first.
Regards,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists