linux-kernel - Re: [PATCH] ext4: reject 1k block fs on the first block of disk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e5fb36f-d234-1f94-5e6c-746aef612bb6@linaro.org>
Date:   Wed, 15 Feb 2023 11:53:40 +0000
From:   Tudor Ambarus <tudor.ambarus@...aro.org>
To:     Theodore Ts'o <tytso@....edu>, Jun Nie <jun.nie@...aro.org>
Cc:     "Darrick J. Wong" <djwong@...nel.org>, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
        Lee Jones <joneslee@...gle.com>
Subject: Re: [PATCH] ext4: reject 1k block fs on the first block of disk



On 2/15/23 11:46, Tudor Ambarus wrote:
> Hi, Ted!
> 
> On 2/15/23 04:32, Theodore Ts'o wrote:
>> On Wed, Jan 04, 2023 at 09:58:03AM +0800, Jun Nie wrote:
>>> Darrick J. Wong <djwong@...nel.org> 于2023年1月4日周三 03:17写道：
>>>>
>>>> On Thu, Dec 29, 2022 at 09:45:02AM +0800, Jun Nie wrote:
>>>>> For 1k-block filesystems, the filesystem starts at block 1, not 
>>>>> block 0.
>>>>> If start_fsb is 0, it will be bump up to s_first_data_block. Then
>>>>> ext4_get_group_no_and_offset don't know what to do and return garbage
>>>>> results (blockgroup 2^32-1). The underflow make index
>>>>> exceed es->s_groups_count in ext4_get_group_info() and trigger the 
>>>>> BUG_ON.
>>>>>
>>>>> Fixes: 4a4956249dac0 ("ext4: fix off-by-one fsmap error on 1k block 
>>>>> filesystems")
>>>>> Link: 
>>>>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>>>> Reported-by: syzbot+6be2b977c89f79b6b153@...kaller.appspotmail.com
>>>>> Signed-off-by: Jun Nie <jun.nie@...aro.org>
>>>>> ---
>>>>>   fs/ext4/fsmap.c | 6 ++++++
>>>>>   1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
>>>>> index 4493ef0c715e..1aef127b0634 100644
>>>>> --- a/fs/ext4/fsmap.c
>>>>> +++ b/fs/ext4/fsmap.c
>>>>> @@ -702,6 +702,12 @@ int ext4_getfsmap(struct super_block *sb, 
>>>>> struct ext4_fsmap_head *head,
>>>>>                if (handlers[i].gfd_dev > head->fmh_keys[0].fmr_device)
>>>>>                        memset(&dkeys[0], 0, sizeof(struct 
>>>>> ext4_fsmap));
>>>>>
>>>>> +             /*
>>>>> +              * Re-check the range after above limit operation and 
>>>>> reject
>>>>> +              * 1K fs on block 0 as fs should start block 1. */
>>>>> +             if (dkeys[0].fmr_physical ==0 && 
>>>>> dkeys[1].fmr_physical == 0)
>>>>> +                     continue;
>>>>
>>>> ...and if this filesystem has 4k blocks, and therefore *does* define a
>>>> block 0?
>>>
>>> Yes, this is a real corner case test :-)
>>
>> So I'm really nervous about this change.  I don't understand the code;
>> and I don't understand how the reproducer works.  I can certainly
>> reproduce it using the reproducer found here[1], but it seems to
>> require running multiple processes all creating loop devices and then
>> running FS_IOC_GETMAP.
>>
>> [1] 
>> https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
>>
>> If I change the reproducer to just run the execute_one() once, it
>> doesn't trigger the bug.  It seems to only trigger when you have
>> multiple processes all racing to create a loop device, mount the file
>> system, try running FS_IOC_GETMAP --- and then delete the loop device
>> without actually unmounting the file system.  Which is **weird***.
>>
>> I've tried taking the image, and just running "xfs_io -c fsmap /mnt",
>> and that doesn't trigger it either.
>>
>> And I don't understand the reply to Darrick's question about why it's
>> safe to add the check since for 4k block file systems, block 0 *is*
>> valid.
>>
>> So if someone can explain to me what is going on here with this code
>> (there are too many abstractions and what's going on with keys is just
>> making my head hurt), *and* what the change actually does, and how to
>> reproduce the problem with a ***simple*** reproducer -- the syzbot
>> mess doesn't count, that would be great.  But applying a change that I
>> don't understand to code I don't understand, to fix a reproducer which
>> I also doesn't understand, just doesn't make me feel comfortable.
>>
> 
> Let me share what I understood until now. The low key is zeroed. The
> high key is defined and uses a fmr_physical of value zero, which is
> smaller than the first data block for the 1k-block ext4 fs (which starts
> at offset 1024).
> 
> -> ext4_getfsmap_datadev()
>    keys[0].fmr_physical = 0, keys[1].fmr_physical = 0
>    bofs = le32_to_cpu(sbi->s_es->s_first_data_block) = 1, eofs = 256
>    start_fsb = keys[0].fmr_physical = 1, end_fsb = keys[1].fmr_physical = 0
>    -> ext4_get_group_no_and_offset()
>      blocknr = 1, le32_to_cpu(es->s_first_data_block) =1
>    start_ag = 0, first_cluster = 0
>    ->
>      blocknr = 0, le32_to_cpu(es->s_first_data_block) =1
>    end_ag = 4294967295, last_cluster = 8191

because of poor key validation we get a wrong end_ag which eventually
causes the BUG_ON.

> 
>    Then there's a loop that stops when info->gfi_agno <= end_ag; that 
> will trigger the BUG_ON in ext4_get_group_info() as the group nr exceeds 
> EXT4_SB(sb)->s_groups_count)
>    -> ext4_mballoc_query_range()
>      -> ext4_mb_load_buddy()
>        -> ext4_mb_load_buddy_gfp()
>          -> ext4_get_group_info()
> 
> It's an out of bounds request and Darrick suggested to not return any
> mapping for the byte range 0-1023 for the 1k-block filesystem. The
> alternative would be to return -EINVAL when the high key starts at
> fmr_phisical of value zero for the 1k-block fs.
> 
> In order to reproduce this one would have to create an 1k-block ext4 fs
> and to pass a high key with fmr_physical of value zero, thus I would
> expect to reproduce it with something like this:
> xfs_io -c 'fsmap -d 0 0' /mnt/scratch
> 
> However when doing this I notice that in
> xfsprogs-dev/io/fsmap.c l->fmr_device and h->fmr_device will have value
> zero, FS_IOC_GETFSMAP is called and then we receive no entries
> (head->fmh_entries = 0). Now I'm trying to see what I do wrong, and how
> to reproduce the bug.
> 
> Cheers,
> ta