[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20200419164812.909D45204F@d06av21.portsmouth.uk.ibm.com>
Date: Sun, 19 Apr 2020 22:18:11 +0530
From: Ritesh Harjani <riteshh@...ux.ibm.com>
To: Eric Sandeen <sandeen@...deen.net>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Cc: Jan Kara <jack@...e.cz>, "Theodore Ts'o" <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>
Subject: Re: strange allocator behavior on a 2k block fs, skipping free blocks
Hello All,
On 4/17/20 12:46 AM, Eric Sandeen wrote:
> This got picked up by xfstests generic/018 on a 2k block filesystem when it
> failed to defragment a file into 1 extent as expected.
>
> For some reason, the allocator is skipping over free blocks when it allocates
> the donor file. The attached image shows this behavior - if you do:
>
> # bunzip2 ext4.img.qcow.bz2
> # qemu-img convert -O raw ext4.img.qcow ext4.img
> # mkdir -p mnt
> # mount -o loop ext4.img mnt/
> # fallocate -l 20480 mnt/newfile
> # filefrag -v mnt/newfile
> Filesystem type is: ef53
> File size of mnt/newfile is 20480 (10 blocks of 2048 bytes)
> ext: logical_offset: physical_offset: length: expected: flags:
> 0: 0.. 1: 16962.. 16963: 2: unwritten
> 1: 2.. 9: 16968.. 16975: 8: 16964: unwritten,eof
> mnt/newfile: 2 extents found
>
> it allocates 2 extents, even though the blocks in between the extents are free:
>
> # dumpe2fs test.img | grep -w 16964
> dumpe2fs 1.42.9 (28-Dec-2013)
> Free blocks: 16964-16967, 16976-17407, 17410-17919, 17922-18431, 18434-18943, 18946-19455, 19457-19967, 19969-32767
>
So my initial investigation on this says that below is what is
happening. Also verified by logs.
1. Initially when the fallocate blocks are requested with length of 10
blocks. (please note in fallocate path we don't set the
EXT4_MB_HINT_TRY_GOAL).
-> For blocks of length 10 (since length of not order of 2
multiple), we chose allocation criteria as 1. And go for
ext4_mb_scan_aligned() with stripe size as 2. So in that function
we only look for 2 blocks as needed blocks(since stripe size is 2
blocks) and we return this 2 blocks as the allocated blocks from
ext4_map_blocks.
This is where we get the blocks as (16962, 16963).
2. Now again fallocate path request for remaining length which is 8.
At this time, since 8 is equal 2^3 request. So we go with criteria
as 0. And try the allocation path via ext4_mb_simple_scan_group().
In 2nd iteration, buddy structures are scanned to find the right fit of
the block. That's why we see two extents in above results.
I guess if we make stripe size as 0, then I don't think we will see
this problem.
> I suppose this isn't critical, as defrag is best-effort and the allocator doesn't ever guarantee contiguous allocations, but it still seems a little odd so just thought I'd highlight it.
But others can tell if this is really a problem which needs fixing in
the long run?
-ritesh
Powered by blists - more mailing lists