[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140130031401.GD8798@birch.djwong.org>
Date: Wed, 29 Jan 2014 19:14:01 -0800
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: "Theodore Ts'o" <tytso@....edu>
Cc: linux-ext4@...r.kernel.org
Subject: Re: FAST paper on ffsck
On Wed, Jan 29, 2014 at 10:57:41AM -0800, Darrick J. Wong wrote:
> On Mon, Dec 09, 2013 at 01:01:49PM -0500, Theodore Ts'o wrote:
> > Andreas brought up on today's conference call Kirk McKusick's recent
> > changes[1] to try to improve fsck times for FFS, in response to the
> > recent FAST paper covering fsck speed ups for ext3, "ffsck: The Fast
> > Filesystem Checker"[2]
> >
> > [1] http://www.mckusick.com/publications/faster_fsck.pdf
> > [2] https://www.usenix.org/system/files/conference/fast13/fast13-final52_0.pdf
> >
> > All of the changes which Kirk outlined are ones which we had done
> > several years ago, in the early days of ext4 development. I talked
> > about some of these in some blog entries, "Fast ext4 fsck times"[3], and
> > "Fast ext4 fsck times, revisited"[4]
> >
> > [3] http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/
> > [4] http://thunk.org/tytso/blog/2009/02/26/fast-ext4-fsck-times-revisited/
> >
> > (Apologies for the really bad formatting; I recovered my blog from
> > backups a few months ago, installed onto a brand-new Wordpress
> > installation --- since the old one was security bug ridden and
> > horribly obsolete --- and I haven't had a chance to fix up some of the
> > older blog entries that had explicit HTML for tables to work with the
> > new theme.)
> >
> > One further observation from reading the ffsck paper. Their method of
> > introducing heavy file system fragmentation resulted in a file system
> > where most of the files had external extent tree blocks; that is, the
> > trees had a depth > 1. I have not observed this in file systems under
> > normal load, since most files are written once and not rewritten, and
> > those that are rewritten (i.e., database files) are not the common
> > case, and even then, generally aren't written in a random append
> > workload where there are hundreds of files in the same directory which
> > are appended to in random order. So looking at at a couple file
> > systems' fsck -v output, I find results such as this:
> >
> > Extent depth histogram: 1229346/569/3
> > Extent depth histogram: 332256/141
> > Extent depth histogram: 23253/456
> >
> > ... where the first number is the number of inode where all of the
> > extent information stored in the inode, and the second number is the
> > number of inodes with a single level of external extent tree blocks,
> > and so on.
> >
> > As a result, I'm not seeing the fsck time degradation resulting from
> > file system aging, because with at leat my workloads, the file system
> > isn't getting fragmented in enough to result in a large number of
> > inodes with external extent tree blocks.
> >
> > We could implement schemes to optimize fsck performance for heavily
> > fragmented file systems; a few which could be done using just e2fsck
> > optimizations, and some which would require file system format
> > changes. However, it's not clear to me that it's worth it.
> >
> > If folks would like help run some experiments, it would be useful to
> > run a test e2fsck on a partition: "e2fsck -Fnfvtt /dev/sdb1" and look
> > at the extent depth histogram and the I/O rates for the various e2fsck
> > passes (see below for an example).
> >
> > If you have examples where the file system has a very large number of
> > inodes with extent tree depths > 1, it would be useful to see these
> > numbers, with a description of how old the file system has been, and
> > what sort of workload might have contributed to its aging.
> >
>
> I don't know about "very large", but here's what I see on the server that I
> share with some friends. Afaik it's used mostly for VM images and test
> kernels... and other parallel-write-once files. ;) This FS has been running
> since Nov. 2012. That said, I think the VM images were created without
> fallocate; some of these files have tens of thousands of tiny extents.
>
> 5386404 inodes used (4.44%, out of 121307136)
> 22651 non-contiguous files (0.4%)
> 7433 non-contiguous directories (0.1%)
> # of inodes with ind/dind/tind blocks: 0/0/0
> Extent depth histogram: 5526723/1334/16
> 202583901 blocks used (41.75%, out of 485198848)
> 0 bad blocks
> 34 large files
>
> 5207070 regular files
> 313009 directories
> 576 character device files
> 192 block device files
> 11 fifos
> 1103023 links
> 94363 symbolic links (86370 fast symbolic links)
> 73 sockets
> ------------
> 6718317 files
>
> On my main dev box, which is entirely old photos, mp3s, VM images, and kernel
> builds, I see:
>
> 2155348 inodes used (2.94%, out of 73211904)
> 14923 non-contiguous files (0.7%)
> 1528 non-contiguous directories (0.1%)
> # of inodes with ind/dind/tind blocks: 0/0/0
> Extent depth histogram: 2147966/685/3
> 85967035 blocks used (29.36%, out of 292834304)
> 0 bad blocks
> 6 large files
>
> 1862617 regular files
> 284915 directories
> 370 character device files
> 59 block device files
> 6 fifos
> 609215 links
> 7454 symbolic links (6333 fast symbolic links)
> 24 sockets
> ------------
> 2764660 files
>
> Sadly, since I've left the LTC I no longer have access to tux1, which had a
> rather horrifically fragmented ext3. Its backup server, which created a Time
> Machine-like series of "snapshots" with rsync --link-dest, took days to fsck,
> despite being ext4.
Well, I got a partial report -- the fs containing ISO images produced this fsck
output. Not terribly helpful, alas.
561392 inodes used (0.21%)
14007 non-contiguous inodes (2.5%)
# of inodes with ind/dind/tind blocks: 93077/7341/74
440877945 blocks used (82.12%)
0 bad blocks
382 large files
492651 regular files
36414 directories
270 character device files
760 block device files
3 fifos
2514 links
31930 symbolic links (31398 fast symbolic links)
4 sockets
--------
564546 files
--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists