lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131209180149.GA6096@thunk.org>
Date:	Mon, 9 Dec 2013 13:01:49 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	linux-ext4@...r.kernel.org
Subject: FAST paper on ffsck

Andreas brought up on today's conference call Kirk McKusick's recent
changes[1] to try to improve fsck times for FFS, in response to the
recent FAST paper covering fsck speed ups for ext3, "ffsck: The Fast
Filesystem Checker"[2]

[1] http://www.mckusick.com/publications/faster_fsck.pdf
[2] https://www.usenix.org/system/files/conference/fast13/fast13-final52_0.pdf

All of the changes which Kirk outlined are ones which we had done
several years ago, in the early days of ext4 development.  I talked
about some of these in some blog entries, "Fast ext4 fsck times"[3], and
"Fast ext4 fsck times, revisited"[4]

[3] http://thunk.org/tytso/blog/2008/08/08/fast-ext4-fsck-times/
[4] http://thunk.org/tytso/blog/2009/02/26/fast-ext4-fsck-times-revisited/

(Apologies for the really bad formatting; I recovered my blog from
backups a few months ago, installed onto a brand-new Wordpress
installation --- since the old one was security bug ridden and
horribly obsolete --- and I haven't had a chance to fix up some of the
older blog entries that had explicit HTML for tables to work with the
new theme.)

One further observation from reading the ffsck paper.  Their method of
introducing heavy file system fragmentation resulted in a file system
where most of the files had external extent tree blocks; that is, the
trees had a depth > 1.  I have not observed this in file systems under
normal load, since most files are written once and not rewritten, and
those that are rewritten (i.e., database files) are not the common
case, and even then, generally aren't written in a random append
workload where there are hundreds of files in the same directory which
are appended to in random order.  So looking at at a couple file
systems' fsck -v output, I find results such as this:

             Extent depth histogram: 1229346/569/3
             Extent depth histogram: 332256/141
             Extent depth histogram: 23253/456

... where the first number is the number of inode where all of the
extent information stored in the inode, and the second number is the
number of inodes with a single level of external extent tree blocks,
and so on.

As a result, I'm not seeing the fsck time degradation resulting from
file system aging, because with at leat my workloads, the file system
isn't getting fragmented in enough to result in a large number of
inodes with external extent tree blocks.

We could implement schemes to optimize fsck performance for heavily
fragmented file systems; a few which could be done using just e2fsck
optimizations, and some which would require file system format
changes.  However, it's not clear to me that it's worth it.

If folks would like help run some experiments, it would be useful to
run a test e2fsck on a partition: "e2fsck -Fnfvtt /dev/sdb1" and look
at the extent depth histogram and the I/O rates for the various e2fsck
passes (see below for an example).

If you have examples where the file system has a very large number of
inodes with extent tree depths > 1, it would be useful to see these
numbers, with a description of how old the file system has been, and
what sort of workload might have contributed to its aging.

Thanks, regards,

					- Ted

e2fsck 1.42.8 (20-Jun-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 668k/7692k (575k/94k), time:  0.92/ 0.42/ 0.02
Pass 1: I/O read: 11MB, write: 0MB, rate: 11.95MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 784k/15196k (466k/319k), time:  0.44/ 0.03/ 0.00
Pass 2: I/O read: 10MB, write: 0MB, rate: 22.76MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 784k/15196k (466k/319k), time:  1.60/ 0.63/ 0.02
Pass 3: Memory used: 784k/15196k (439k/346k), time:  0.00/ 0.00/ 0.00
Pass 3: I/O read: 1MB, write: 0MB, rate: 2793.30MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 784k/188k (432k/353k), time:  0.63/ 0.63/ 0.00
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 784k/188k (426k/359k), time:  4.95/ 0.16/ 0.10
Pass 5: I/O read: 19MB, write: 0MB, rate: 3.84MB/s

       13825 inodes used (0.03%, out of 47906816)
        1425 non-contiguous files (10.3%)
          11 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 12986/831
   141525383 blocks used (73.85%, out of 191627264)
           0 bad blocks
           4 large files

       11537 regular files
        2279 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
       13816 files
Memory used: 784k/188k (426k/359k), time:  7.19/ 1.42/ 0.12
I/O read: 39MB, write: 0MB, rate: 5.43MB/s

Note: the reason why this file system has so many files with large
extents is because there are some video files which large enough that
even when contiguous, they will require an external extent block, e.g:

File size of 01 Yankee White.m4v is 499375730 (121918 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   19802112..  19802112:      1:            
   1:        2..     315:   19802114..  19802427:    314:   19802113:
   2:      543..   14335:   19802655..  19816447:  13793:   19802428:
   3:    14336..   47103:   19830784..  19863551:  32768:   19816448:
   4:    47104..   73727:   19896320..  19922943:  26624:   19863552:
   5:    73728..   79871:   19955712..  19961855:   6144:   19922944:
   6:    79872..  112639:   19994624..  20027391:  32768:   19961856:
   7:   112640..  121917:   20060160..  20069437:   9278:   20027392: eof
01 Yankee White.m4v: 8 extents found

BTW, looking at the output of filefrag -v on large files, it does look
like there is some work we can do to improve the block allocation
hueristics.  These files were written w/o the benefit of fallocate,
but with delayed allocation, and apparently we aren't automatically
figuring out that we should be in stream mode from the get-go.  This
pattern is reproduced in most of the files in the directory:

File size of 02 Hung Out to Dry.m4v is 552382434 (134859 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   19816448..  19816448:      1:            
   1:        2..     314:   19816450..  19816762:    313:   19816449:
   2:      542..   14335:   19816990..  19830783:  13794:   19816763:
   3:    14336..   47103:   19863552..  19896319:  32768:   19830784:
   4:    47104..   79871:   19961856..  19994623:  32768:   19896320:
   5:    79872..  112639:   20027392..  20060159:  32768:   19994624:
   6:   112640..  134858:   20070400..  20092618:  22219:   20060160: eof
02 Hung Out to Dry.m4v: 7 extents found

File size of 03 Sea Dog.m4v is 553146161 (135046 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   20092928..  20092928:      1:            
   1:        2..     159:   20092930..  20093087:    158:   20092929:
   2:      161..     306:   20093089..  20093234:    146:   20093088:
   3:      534..   14335:   20093462..  20107263:  13802:   20093235:
   4:    14336..   47103:   20121600..  20154367:  32768:   20107264:
   5:    47104..   79871:   20187136..  20219903:  32768:   20154368:
   6:    79872..  112639:   20252672..  20285439:  32768:   20219904:
   7:   112640..  135045:   20318208..  20340613:  22406:   20285440: eof
03 Sea Dog.m4v: 8 extents found

File size of 04 The Immortals.m4v is 516091162 (125999 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   20107264..  20107264:      1:            
   1:        2..     162:   20107266..  20107426:    161:   20107265:
   2:      164..     312:   20107428..  20107576:    149:   20107427:
   3:      540..   14335:   20107804..  20121599:  13796:   20107577:
   4:    14336..   47103:   20154368..  20187135:  32768:   20121600:
   5:    47104..   79871:   20219904..  20252671:  32768:   20187136:
   6:    79872..  112639:   20285440..  20318207:  32768:   20252672:
   7:   112640..  125998:   20340736..  20354094:  13359:   20318208: eof
04 The Immortals.m4v: 8 extents found

Looking at all of these files, actually, if we had managed to allocate
them using contiguous 32768 block extents, these 45 minute TV episodes
would have just fit inside the in-inode's 4 extent slots.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ