lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1359527713.648.140661184334613.06CF38D4@webmail.messagingengine.com>
Date:	Wed, 30 Jan 2013 17:35:13 +1100
From:	Bron Gondwana <brong@...tmail.fm>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	linux-ext4@...r.kernel.org, Rob Mueller <robm@...tmail.fm>
Subject: Re: fallocate creating fragmented files

On Wed, Jan 30, 2013, at 05:05 PM, Eric Sandeen wrote:
> On 1/29/13 11:46 PM, Bron Gondwana wrote:
> > Hi All,
> > 
> > I'm trying to understand why my ext4 filesystem is creating highly fragmented files even though it's only just over 50% full.
> 
> It's at least possible that freespace is very fragmented; you could try the "e2freefrag" command to see.

[brong@...p14 ~]$ e2freefrag /dev/md0
Device: /dev/md0
Blocksize: 1024 bytes
Total blocks: 62522624
Free blocks: 26483551 (42.4%)

Min. free extent: 1 KB 
Max. free extent: 757 KB
Avg. free extent: 14 KB
Num. free extent: 1940838

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
    1K...    2K-  :        538480        538480    2.03%
    2K...    4K-  :        362189        870860    3.29%
    4K...    8K-  :        321158       1681591    6.35%
    8K...   16K-  :        268848       2934959   11.08%
   16K...   32K-  :        210746       4697440   17.74%
   32K...   64K-  :        151755       6738418   25.44%
   64K...  128K-  :         63761       5512870   20.82%
  128K...  256K-  :         20563       3552580   13.41%
  256K...  512K-  :          3308       1047995    3.96%
  512K... 1024K-  :            30         17615    0.07%

> > Now looking at the verbose output, we can see that there are many extents of just 3 or 4 blocks:
> > 
> > [brong@...p14 conf]$ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | head
> >       2 
> >       1 is
> >       1 length
> >       1 unwritten
> >       6 3
> >      10 4
> >       6 5
> >       5 6
> >       3 7
> >       1 8
> 
> But longer extents too, right:
> 
> $ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | tail
>       1 162
>       1 164
>       1 179
>       1 188
>       1 215
>       1 231
>       1 233
>       1 255
>       1 322
>       1 357
> 
> > Yet looking at the next file,
> > 
> > [brong@...p14 conf]$ filefrag -v testfile2 | awk '{print $5}' | sort -n | uniq -c | tail
> >       1 173
> >       1 175
> >       1 178
> >       1 184
> >       1 187
> >       1 189
> >       1 194
> >       1 289
> >       1 321
> >       1 330
> > 
> 
> and presumably shorter extents at the beginning?

Well, that's sorted.  Yes, there were shorter extents too.

> So it sounds like both files are a mix of long & short extents.

Definitely. 

> > There are multiple extents of hundreds of blocks in length.  Why weren't they used in allocating the first file?
> 
> I'm not sure, offhand.  But just to be clear, while contiguous allocations are usually a nice side-effect of fallocate, nothing at all guarantees it.  It only guarantees that you'll have that space available for future writes.

Sure.  I was hoping it would help though!

> Still, it'd be interesting to figure out why the allocator is behaving this way.
> It'd be interesting to see the freefrag info, the allocator might really be in scavenger mode.

What do you think from the output above.  Is that reasonable?  I'll check a more recently set-up machine.

[brong@...p30 ~]$ e2freefrag /dev/sdf1
Device: /dev/sdf1
Blocksize: 1024 bytes

Total blocks: 97124320
Free blocks: 68429391 (70.5%)

Min. free extent: 1 KB 
Max. free extent: 1009 KB
Avg. free extent: 25 KB
Num. free extent: 2781696

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
    1K...    2K-  :        705257        705257    1.03%
    2K...    4K-  :        553577       1348712    1.97%
    4K...    8K-  :        349406       1789755    2.62%
    8K...   16K-  :        289102       3185026    4.65%
   16K...   32K-  :        279061       6307452    9.22%
   32K...   64K-  :        271631      12321046   18.01%
   64K...  128K-  :        205191      18340308   26.80%
  128K...  256K-  :        110082      19121199   27.94%
  256K...  512K-  :         16962       5584384    8.16%
  512K... 1024K-  :          1427        882388    1.29%

This one is 100Gb SSDs from some other vendor (can't remember which) on hardware RAID1.  It's never been more than about 30% full.  It looks like a similar histogram of extent sizes.  Again it's a 1kb block size (piles of small files on these filesystems)

[brong@...p30 ~]$ dumpe2fs -h /dev/sdf1
dumpe2fs 1.42.4 (12-Jun-2012)
Filesystem volume name:   ssd30
Last mounted on:          /mnt/ssd30
Filesystem UUID:          c2623b6a-b3f4-4a5a-99e3-495f29112ba6
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              12140544
Block count:              97124320
Reserved block count:     4856216
Free blocks:              68429391
Free inodes:              7157347
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      256
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         1024
Inode blocks per group:   256
Flex block group size:    16
Filesystem created:       Tue Aug  2 07:39:40 2011
Last mount time:          Thu Jan 24 23:15:41 2013
Last write time:          Thu Jan 24 23:15:41 2013
Mount count:              10
Maximum mount count:      39
Last checked:             Tue Aug  2 07:39:40 2011
Check interval:           15552000 (6 months)
Next check after:         Sun Jan 29 06:39:40 2012
Lifetime writes:          13 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      0ecbfe75-57e3-4d4e-b4a8-bf0114dc0997
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             32M
Journal length:           32768
Journal sequence:         0x32367a0d
Journal start:            1537

Regards,

Bron.
-- 
  Bron Gondwana
  brong@...tmail.fm

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists