linux-ext4 - Re: FAST paper on ffsck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140130031401.GD8798@birch.djwong.org>
Date:	Wed, 29 Jan 2014 19:14:01 -0800
From:	"Darrick J. Wong" <darrick.wong@...cle.com>
To:	"Theodore Ts'o" <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: FAST paper on ffsck

On Wed, Jan 29, 2014 at 10:57:41AM -0800, Darrick J. Wong wrote:
> On Mon, Dec 09, 2013 at 01:01:49PM -0500, Theodore Ts'o wrote:
> > Andreas brought up on today's conference call Kirk McKusick's recent
> > changes[1] to try to improve fsck times for FFS, in response to the
> > recent FAST paper covering fsck speed ups for ext3, "ffsck: The Fast
> > Filesystem Checker"[2]
> > 
> > [1] http://www.mckusick.com/publications/faster_fsck.pdf
> > [2] https://www.usenix.org/system/files/conference/fast13/fast13-final52_0.pdf
> > 
> > All of the changes which Kirk outlined are ones which we had done
> > several years ago, in the early days of ext4 development.  I talked
> > about some of these in some blog entries, "Fast ext4 fsck times"[3], and
> > "Fast ext4 fsck times, revisited"[4]
> > 
> > [3] http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/
> > [4] http://thunk.org/tytso/blog/2009/02/26/fast-ext4-fsck-times-revisited/
> > 
> > (Apologies for the really bad formatting; I recovered my blog from
> > backups a few months ago, installed onto a brand-new Wordpress
> > installation --- since the old one was security bug ridden and
> > horribly obsolete --- and I haven't had a chance to fix up some of the
> > older blog entries that had explicit HTML for tables to work with the
> > new theme.)
> > 
> > One further observation from reading the ffsck paper.  Their method of
> > introducing heavy file system fragmentation resulted in a file system
> > where most of the files had external extent tree blocks; that is, the
> > trees had a depth > 1.  I have not observed this in file systems under
> > normal load, since most files are written once and not rewritten, and
> > those that are rewritten (i.e., database files) are not the common
> > case, and even then, generally aren't written in a random append
> > workload where there are hundreds of files in the same directory which
> > are appended to in random order.  So looking at at a couple file
> > systems' fsck -v output, I find results such as this:
> > 
> >              Extent depth histogram: 1229346/569/3
> >              Extent depth histogram: 332256/141
> >              Extent depth histogram: 23253/456
> > 
> > ... where the first number is the number of inode where all of the
> > extent information stored in the inode, and the second number is the
> > number of inodes with a single level of external extent tree blocks,
> > and so on.
> > 
> > As a result, I'm not seeing the fsck time degradation resulting from
> > file system aging, because with at leat my workloads, the file system
> > isn't getting fragmented in enough to result in a large number of
> > inodes with external extent tree blocks.
> > 
> > We could implement schemes to optimize fsck performance for heavily
> > fragmented file systems; a few which could be done using just e2fsck
> > optimizations, and some which would require file system format
> > changes.  However, it's not clear to me that it's worth it.
> > 
> > If folks would like help run some experiments, it would be useful to
> > run a test e2fsck on a partition: "e2fsck -Fnfvtt /dev/sdb1" and look
> > at the extent depth histogram and the I/O rates for the various e2fsck
> > passes (see below for an example).
> > 
> > If you have examples where the file system has a very large number of
> > inodes with extent tree depths > 1, it would be useful to see these
> > numbers, with a description of how old the file system has been, and
> > what sort of workload might have contributed to its aging.
> > 
> 
> I don't know about "very large", but here's what I see on the server that I
> share with some friends.  Afaik it's used mostly for VM images and test
> kernels... and other parallel-write-once files. ;)  This FS has been running
> since Nov. 2012.  That said, I think the VM images were created without
> fallocate; some of these files have tens of thousands of tiny extents.
> 
>      5386404 inodes used (4.44%, out of 121307136)
>        22651 non-contiguous files (0.4%)
>         7433 non-contiguous directories (0.1%)
>              # of inodes with ind/dind/tind blocks: 0/0/0
>              Extent depth histogram: 5526723/1334/16
>    202583901 blocks used (41.75%, out of 485198848)
>            0 bad blocks
>           34 large files
> 
>      5207070 regular files
>       313009 directories
>          576 character device files
>          192 block device files
>           11 fifos
>      1103023 links
>        94363 symbolic links (86370 fast symbolic links)
>           73 sockets
> ------------
>      6718317 files
> 
> On my main dev box, which is entirely old photos, mp3s, VM images, and kernel
> builds, I see:
> 
>      2155348 inodes used (2.94%, out of 73211904)
>        14923 non-contiguous files (0.7%)
>         1528 non-contiguous directories (0.1%)
>              # of inodes with ind/dind/tind blocks: 0/0/0
>              Extent depth histogram: 2147966/685/3
>     85967035 blocks used (29.36%, out of 292834304)
>            0 bad blocks
>            6 large files
> 
>      1862617 regular files
>       284915 directories
>          370 character device files
>           59 block device files
>            6 fifos
>       609215 links
>         7454 symbolic links (6333 fast symbolic links)
>           24 sockets
> ------------
>      2764660 files
> 
> Sadly, since I've left the LTC I no longer have access to tux1, which had a
> rather horrifically fragmented ext3.  Its backup server, which created a Time
> Machine-like series of "snapshots" with rsync --link-dest, took days to fsck,
> despite being ext4.

Well, I got a partial report -- the fs containing ISO images produced this fsck
output.  Not terribly helpful, alas.

  561392 inodes used (0.21%)
   14007 non-contiguous inodes (2.5%)
         # of inodes with ind/dind/tind blocks: 93077/7341/74
440877945 blocks used (82.12%)
       0 bad blocks
     382 large files
  492651 regular files
   36414 directories
     270 character device files
     760 block device files
       3 fifos
    2514 links
   31930 symbolic links (31398 fast symbolic links)
       4 sockets
--------
  564546 files

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html