linux-kernel - Re: kernel BUG in __clear_extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1c1f0f00-cb9a-af0a-2662-bb144e353aa0@gmx.com>
Date:   Thu, 23 Sep 2021 10:30:58 +0800
From:   Qu Wenruo <quwenruo.btrfs@....com>
To:     Hao Sun <sunhao.th@...il.com>
Cc:     clm@...com, dsterba@...e.com, Josef Bacik <josef@...icpanda.com>,
        linux-btrfs@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: kernel BUG in __clear_extent_bit



On 2021/9/23 10:24, Hao Sun wrote:
> Qu Wenruo <quwenruo.btrfs@....com> 于2021年9月15日周三 下午1:33写道：
>>
>>
>>
>> On 2021/9/15 上午10:20, Hao Sun wrote:
>>> Hello,
>>>
>>> When using Healer to fuzz the latest Linux kernel, the following crash
>>> was triggered.
>>>
>>> HEAD commit: 6880fa6c5660 Linux 5.15-rc1
>>> git tree: upstream
>>> console output:
>>> https://drive.google.com/file/d/1-9wwV6-OmBcJvHGCbMbP5_uCVvrUdTp3/view?usp=sharing
>>> kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing
>>> C reproducer: https://drive.google.com/file/d/1eXePTqMQ5ZA0TWtgpTX50Ez4q9ZKm_HE/view?usp=sharing
>>> Syzlang reproducer:
>>> https://drive.google.com/file/d/11s13louoKZ7Uz0mdywM2jmE9B1JEIt8U/view?usp=sharing
>>>
>>> If you fix this issue, please add the following tag to the commit:
>>> Reported-by: Hao Sun <sunhao.th@...il.com>
>>>
>>> loop1: detected capacity change from 0 to 32768
>>> BTRFS info (device loop1): disk space caching is enabled
>>> BTRFS info (device loop1): has skinny extents
>>> BTRFS info (device loop1): enabling ssd optimizations
>>> FAULT_INJECTION: forcing a failure.
>>> name failslab, interval 1, probability 0, space 0, times 0
>>> CPU: 1 PID: 25852 Comm: syz-executor Not tainted 5.15.0-rc1 #16
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>>> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
>>> Call Trace:
>>>    __dump_stack lib/dump_stack.c:88 [inline]
>>>    dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
>>>    fail_dump lib/fault-inject.c:52 [inline]
>>>    should_fail+0x13c/0x160 lib/fault-inject.c:146
>>>    should_failslab+0x5/0x10 mm/slab_common.c:1328
>>>    slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
>>>    slab_alloc_node mm/slub.c:3120 [inline]
>>>    slab_alloc mm/slub.c:3214 [inline]
>>>    kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
>>>    alloc_extent_state+0x1e/0x1c0 fs/btrfs/extent_io.c:340
>>
>> This is the one of the core systems btrfs uses, and we really don't want
>> that to fail.
>>
>> Thus in fact it does some preallocation to prevent failure.
>>
>> But for error injection case, we can still hit BUG_ON() which is used to
>> catch ENOMEM.
>>
>
> Hello,
>
> Fuzzer triggered following crashes repeatedly when the `fault
> injection` was enabled.
>
> HEAD commit: 92477dd1faa6 Merge tag 's390-5.15-ebpf-jit-fixes'
> git tree: upstream
> kernel config: https://drive.google.com/file/d/1KgvcM8i_3hQiOL3fUh3JFpYNQM4itvV4/view?usp=sharing
> [1] kernel BUG in btrfs_free_tree_block (fs/btrfs/extent-tree.c:3297):
> https://paste.ubuntu.com/p/ZtzVKWbcGm/
> [2] kernel BUG in clear_state_bit (fs/btrfs/extent_io.c:658!):
> https://paste.ubuntu.com/p/hps2wXPG2b/
> [3] kernel BUG in set_extent_bit (fs/btrfs/extent_io.c:1021):
> https://paste.ubuntu.com/p/dcptjYYxgd/
> [4] kernel BUG in set_state_bits (fs/btrfs/extent_io.c:939):
> https://paste.ubuntu.com/p/NV9qtKB4KZ/
>
> All the above crashes were triggered directly by the `BUG_ON()` macro
> in the corresponding location.
> Most `BUG_ON()` was hit due to `ENOMEM` when fault injected.
> Would it be better for btrfs to handle the `ENOMEM` error, e.g.,
> gracefully return, rather than panic the kernel?

The __clear_extent_bit() part is one of the critical section where we
really rely on that to be work correctly.

We even implemented a preallocation scheme to prevent such problems, but
error injection won't be completely resolved by preallocation.

We indeed need to do the error handling better, but that would be a
pretty big project, not something can be easily done right now.

Thanks,
Qu
>
> Regards
> Hao
>