linux-ext4 - Re: strange allocator behavior on a 2k block fs, skipping free blocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20200419164812.909D45204F@d06av21.portsmouth.uk.ibm.com>
Date:   Sun, 19 Apr 2020 22:18:11 +0530
From:   Ritesh Harjani <riteshh@...ux.ibm.com>
To:     Eric Sandeen <sandeen@...deen.net>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Cc:     Jan Kara <jack@...e.cz>, "Theodore Ts'o" <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>
Subject: Re: strange allocator behavior on a 2k block fs, skipping free blocks

Hello All,

On 4/17/20 12:46 AM, Eric Sandeen wrote:
> This got picked up by xfstests generic/018 on a 2k block filesystem when it
> failed to defragment a file into 1 extent as expected.
> 
> For some reason, the allocator is skipping over free blocks when it allocates
> the donor file.  The attached image shows this behavior - if you do:
> 
> # bunzip2 ext4.img.qcow.bz2
> # qemu-img convert -O raw ext4.img.qcow ext4.img
> # mkdir -p mnt
> # mount -o loop ext4.img mnt/
> # fallocate -l 20480 mnt/newfile
> # filefrag -v mnt/newfile
> Filesystem type is: ef53
> File size of mnt/newfile is 20480 (10 blocks of 2048 bytes)
>   ext:     logical_offset:        physical_offset: length:   expected: flags:
>     0:        0..       1:      16962..     16963:      2:             unwritten
>     1:        2..       9:      16968..     16975:      8:      16964: unwritten,eof
> mnt/newfile: 2 extents found
> 
> it allocates 2 extents, even though the blocks in between the extents are free:
> 
> # dumpe2fs test.img | grep -w 16964
> dumpe2fs 1.42.9 (28-Dec-2013)
>    Free blocks: 16964-16967, 16976-17407, 17410-17919, 17922-18431, 18434-18943, 18946-19455, 19457-19967, 19969-32767
> 

So my initial investigation on this says that below is what is
happening. Also verified by logs.
1. Initially when the fallocate blocks are requested with length of 10 
blocks. (please note in fallocate path we don't set the
EXT4_MB_HINT_TRY_GOAL).
	-> For blocks of length 10 (since length of not order of 2
multiple), we chose allocation criteria as 1. And go for
ext4_mb_scan_aligned() with stripe size as 2. So in that function
we only look for 2 blocks as needed blocks(since stripe size is 2
blocks) and we return this 2 blocks as the allocated blocks from
ext4_map_blocks.
This is where we get the blocks as (16962, 16963).

2. Now again fallocate path request for remaining length which is 8.
At this time, since 8 is equal 2^3 request. So we go with criteria
as 0. And try the allocation path via ext4_mb_simple_scan_group().

In 2nd iteration, buddy structures are scanned to find the right fit of 
the block. That's why we see two extents in above results.

I guess if we make stripe size as 0, then I don't think we will see
this problem.

> I suppose this isn't critical, as defrag is best-effort and the allocator doesn't ever guarantee contiguous allocations, but it still seems a little odd so just thought I'd highlight it.

But others can tell if this is really a problem which needs fixing in
the long run?

-ritesh