[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h6f6vqzj.fsf@brahms.olymp>
Date: Thu, 09 May 2024 18:23:44 +0100
From: Luis Henriques <luis.henriques@...ux.dev>
To: "Theodore Ts'o" <tytso@....edu>
Cc: Luis Henriques <luis.henriques@...ux.dev>, Zhang Yi
<yi.zhang@...weicloud.com>, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, adilger.kernel@...ger.ca, jack@...e.cz,
ritesh.list@...il.com, hch@...radead.org, djwong@...nel.org,
willy@...radead.org, zokeefe@...gle.com, yi.zhang@...wei.com,
chengzhihao1@...wei.com, yukuai3@...wei.com, wangkefeng.wang@...wei.com
Subject: Re: [PATCH v3 03/26] ext4: correct the hole length returned by
ext4_map_blocks()
On Thu 09 May 2024 12:39:53 PM -04, Theodore Ts'o wrote;
> On Thu, May 09, 2024 at 04:16:34PM +0100, Luis Henriques wrote:
>>
>> It's looks like it's easy to trigger an infinite loop here using fstest
>> generic/039. If I understand it correctly (which doesn't happen as often
>> as I'd like), this is due to an integer overflow in the 'if' condition,
>> and should be fixed with the patch below.
>
> Thanks for the report. However, I can't reproduce the failure, and
> looking at generic/039, I don't see how it could be relevant to the
> code path in question. Generic/039 creates a test symlink with two
> hard links in the same directory, syncs the file system, and then
> removes one of the hard links, and then drops access to the block
> device using dmflakey. So I don't see how the extent code would be
> involved at all. Are you sure that you have the correct test listed?
Yep, I just retested and it's definitely generic/039. I'm using a simple
test environment, with virtme-ng.
> Looking at the code in question in fs/ext4/extents.c:
>
> again:
> ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start,
> hole_start + len - 1, &es);
> if (!es.es_len)
> goto insert_hole;
>
> * There's a delalloc extent in the hole, handle it if the delalloc
> * extent is in front of, behind and straddle the queried range.
> */
> - if (lblk >= es.es_lblk + es.es_len) {
> + if (lblk >= ((__u64) es.es_lblk) + es.es_len) {
> /*
> * The delalloc extent is in front of the queried range,
> * find again from the queried start block.
> len -= lblk - hole_start;
> hole_start = lblk;
> goto again;
>
> lblk and es.es_lblk are both __u32. So the infinite loop is
> presumably because es.es_lblk + es.es_len has overflowed. This should
> never happen(tm), and in fact we have a test for this case which
If I instrument the code, I can see that es.es_len is definitely set to
EXT_MAX_BLOCKS, which will overflow.
> *should* have gotten tripped when ext4_es_find_extent_range() calls
> __es_tree_search() in fs/ext4/extents_status.c:
>
> static inline ext4_lblk_t ext4_es_end(struct extent_status *es)
> {
> BUG_ON(es->es_lblk + es->es_len < es->es_lblk);
> return es->es_lblk + es->es_len - 1;
> }
>
> So the patch is harmless, and I can see how it might fix what you were
> seeing --- but I'm a bit nervous that I can't reproduce it and the
> commit description claims that it reproduces easily; and we should
> have never allowed the entry to have gotten introduced into the
> extents status tree in the first place, and if it had been introduced,
> it should have been caught before it was returned by
> ext4_es_find_extent_range().
>
> Can you give more details about the reproducer; can you double check
> the test id, and how easily you can trigger the failure, and what is
> the hardware you used to run the test?
So, here's few more details that may clarify, and that I should have added
to the commit description:
When the test hangs, the test is blocked mounting the flakey device:
mount -t ext4 -o acl,user_xattr /dev/mapper/flakey-test /mnt/scratch
which will eventually call into ext4_ext_map_blocks(), triggering the bug.
Also, some more code instrumentation shows that after the call to
ext4_ext_find_hole(), the 'hole_start' will be set to '1' and 'len' to
'0xfffffffe'. This '0xfffffffe' value is a bit odd, but it comes from the
fact that, in ext4_ext_find_hole(), the call to
ext4_ext_next_allocated_block() will return EXT_MAX_BLOCKS and 'len' will
thus be set to 'EXT_MAX_BLOCKS - 1'.
Does this make sense?
Cheers,
--
Luis
Powered by blists - more mailing lists