[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <510942C3.1070503@redhat.com>
Date: Wed, 30 Jan 2013 09:56:51 -0600
From: Eric Sandeen <sandeen@...hat.com>
To: Bron Gondwana <brong@...tmail.fm>
CC: linux-ext4@...r.kernel.org, Rob Mueller <robm@...tmail.fm>
Subject: Re: fallocate creating fragmented files
On 1/30/13 12:35 AM, Bron Gondwana wrote:
> On Wed, Jan 30, 2013, at 05:05 PM, Eric Sandeen wrote:
>> On 1/29/13 11:46 PM, Bron Gondwana wrote:
>>> Hi All,
>>>
>>> I'm trying to understand why my ext4 filesystem is creating highly fragmented files even though it's only just over 50% full.
>>
>> It's at least possible that freespace is very fragmented; you could try the "e2freefrag" command to see.
>
> [brong@...p14 ~]$ e2freefrag /dev/md0
> Device: /dev/md0
> Blocksize: 1024 bytes
> Total blocks: 62522624
> Free blocks: 26483551 (42.4%)
>
> Min. free extent: 1 KB
> Max. free extent: 757 KB
> Avg. free extent: 14 KB
> Num. free extent: 1940838
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 1K... 2K- : 538480 538480 2.03%
> 2K... 4K- : 362189 870860 3.29%
> 4K... 8K- : 321158 1681591 6.35%
> 8K... 16K- : 268848 2934959 11.08%
> 16K... 32K- : 210746 4697440 17.74%
> 32K... 64K- : 151755 6738418 25.44%
> 64K... 128K- : 63761 5512870 20.82%
> 128K... 256K- : 20563 3552580 13.41%
> 256K... 512K- : 3308 1047995 3.96%
> 512K... 1024K- : 30 17615 0.07%
Ok, TBH I'd not certain why the allocator is doing just what it's doing.
There are quite a lot of larger-than-3-block free spaces. OTOH, it might be
trying for some kind of locality.
I think it'd take some digging into the allocator behavior; there may
be tracepoints that'd help.
-Eric
>>> Now looking at the verbose output, we can see that there are many extents of just 3 or 4 blocks:
>>>
>>> [brong@...p14 conf]$ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | head
>>> 2
>>> 1 is
>>> 1 length
>>> 1 unwritten
>>> 6 3
>>> 10 4
>>> 6 5
>>> 5 6
>>> 3 7
>>> 1 8
>>
>> But longer extents too, right:
>>
>> $ filefrag -v testfile | awk '{print $5}' | sort -n | uniq -c | tail
>> 1 162
>> 1 164
>> 1 179
>> 1 188
>> 1 215
>> 1 231
>> 1 233
>> 1 255
>> 1 322
>> 1 357
>>
>>> Yet looking at the next file,
>>>
>>> [brong@...p14 conf]$ filefrag -v testfile2 | awk '{print $5}' | sort -n | uniq -c | tail
>>> 1 173
>>> 1 175
>>> 1 178
>>> 1 184
>>> 1 187
>>> 1 189
>>> 1 194
>>> 1 289
>>> 1 321
>>> 1 330
>>>
>>
>> and presumably shorter extents at the beginning?
>
> Well, that's sorted. Yes, there were shorter extents too.
>
>> So it sounds like both files are a mix of long & short extents.
>
> Definitely.
>
>>> There are multiple extents of hundreds of blocks in length. Why weren't they used in allocating the first file?
>>
>> I'm not sure, offhand. But just to be clear, while contiguous allocations are usually a nice side-effect of fallocate, nothing at all guarantees it. It only guarantees that you'll have that space available for future writes.
>
> Sure. I was hoping it would help though!
>
>> Still, it'd be interesting to figure out why the allocator is behaving this way.
>> It'd be interesting to see the freefrag info, the allocator might really be in scavenger mode.
>
> What do you think from the output above. Is that reasonable? I'll check a more recently set-up machine.
>
> [brong@...p30 ~]$ e2freefrag /dev/sdf1
> Device: /dev/sdf1
> Blocksize: 1024 bytes
>
> Total blocks: 97124320
> Free blocks: 68429391 (70.5%)
>
> Min. free extent: 1 KB
> Max. free extent: 1009 KB
> Avg. free extent: 25 KB
> Num. free extent: 2781696
>
> HISTOGRAM OF FREE EXTENT SIZES:
> Extent Size Range : Free extents Free Blocks Percent
> 1K... 2K- : 705257 705257 1.03%
> 2K... 4K- : 553577 1348712 1.97%
> 4K... 8K- : 349406 1789755 2.62%
> 8K... 16K- : 289102 3185026 4.65%
> 16K... 32K- : 279061 6307452 9.22%
> 32K... 64K- : 271631 12321046 18.01%
> 64K... 128K- : 205191 18340308 26.80%
> 128K... 256K- : 110082 19121199 27.94%
> 256K... 512K- : 16962 5584384 8.16%
> 512K... 1024K- : 1427 882388 1.29%
>
> This one is 100Gb SSDs from some other vendor (can't remember which) on hardware RAID1. It's never been more than about 30% full. It looks like a similar histogram of extent sizes. Again it's a 1kb block size (piles of small files on these filesystems)
>
> [brong@...p30 ~]$ dumpe2fs -h /dev/sdf1
> dumpe2fs 1.42.4 (12-Jun-2012)
> Filesystem volume name: ssd30
> Last mounted on: /mnt/ssd30
> Filesystem UUID: c2623b6a-b3f4-4a5a-99e3-495f29112ba6
> Filesystem magic number: 0xEF53
> Filesystem revision #: 1 (dynamic)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super huge_file uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: (none)
> Filesystem state: clean
> Errors behavior: Continue
> Filesystem OS type: Linux
> Inode count: 12140544
> Block count: 97124320
> Reserved block count: 4856216
> Free blocks: 68429391
> Free inodes: 7157347
> First block: 1
> Block size: 1024
> Fragment size: 1024
> Reserved GDT blocks: 256
> Blocks per group: 8192
> Fragments per group: 8192
> Inodes per group: 1024
> Inode blocks per group: 256
> Flex block group size: 16
> Filesystem created: Tue Aug 2 07:39:40 2011
> Last mount time: Thu Jan 24 23:15:41 2013
> Last write time: Thu Jan 24 23:15:41 2013
> Mount count: 10
> Maximum mount count: 39
> Last checked: Tue Aug 2 07:39:40 2011
> Check interval: 15552000 (6 months)
> Next check after: Sun Jan 29 06:39:40 2012
> Lifetime writes: 13 TB
> Reserved blocks uid: 0 (user root)
> Reserved blocks gid: 0 (group root)
> First inode: 11
> Inode size: 256
> Required extra isize: 28
> Desired extra isize: 28
> Journal inode: 8
> Default directory hash: half_md4
> Directory Hash Seed: 0ecbfe75-57e3-4d4e-b4a8-bf0114dc0997
> Journal backup: inode blocks
> Journal features: journal_incompat_revoke
> Journal size: 32M
> Journal length: 32768
> Journal sequence: 0x32367a0d
> Journal start: 1537
>
> Regards,
>
> Bron.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists