linux-kernel - Re: kernel BUG in __clear_extent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACkBjsauCShYkOdNU2snmJyLNSmdMvK7C0HbtMfKhoEXuUOSJg@mail.gmail.com>
Date:   Thu, 23 Sep 2021 10:24:51 +0800
From:   Hao Sun <sunhao.th@...il.com>
To:     Qu Wenruo <quwenruo.btrfs@....com>
Cc:     clm@...com, dsterba@...e.com, Josef Bacik <josef@...icpanda.com>,
        linux-btrfs@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: kernel BUG in __clear_extent_bit

Qu Wenruo <quwenruo.btrfs@....com> 于2021年9月15日周三 下午1:33写道：
>
>
>
> On 2021/9/15 上午10:20, Hao Sun wrote:
> > Hello,
> >
> > When using Healer to fuzz the latest Linux kernel, the following crash
> > was triggered.
> >
> > HEAD commit: 6880fa6c5660 Linux 5.15-rc1
> > git tree: upstream
> > console output:
> > https://drive.google.com/file/d/1-9wwV6-OmBcJvHGCbMbP5_uCVvrUdTp3/view?usp=sharing
> > kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing
> > C reproducer: https://drive.google.com/file/d/1eXePTqMQ5ZA0TWtgpTX50Ez4q9ZKm_HE/view?usp=sharing
> > Syzlang reproducer:
> > https://drive.google.com/file/d/11s13louoKZ7Uz0mdywM2jmE9B1JEIt8U/view?usp=sharing
> >
> > If you fix this issue, please add the following tag to the commit:
> > Reported-by: Hao Sun <sunhao.th@...il.com>
> >
> > loop1: detected capacity change from 0 to 32768
> > BTRFS info (device loop1): disk space caching is enabled
> > BTRFS info (device loop1): has skinny extents
> > BTRFS info (device loop1): enabling ssd optimizations
> > FAULT_INJECTION: forcing a failure.
> > name failslab, interval 1, probability 0, space 0, times 0
> > CPU: 1 PID: 25852 Comm: syz-executor Not tainted 5.15.0-rc1 #16
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:88 [inline]
> >   dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
> >   fail_dump lib/fault-inject.c:52 [inline]
> >   should_fail+0x13c/0x160 lib/fault-inject.c:146
> >   should_failslab+0x5/0x10 mm/slab_common.c:1328
> >   slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
> >   slab_alloc_node mm/slub.c:3120 [inline]
> >   slab_alloc mm/slub.c:3214 [inline]
> >   kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
> >   alloc_extent_state+0x1e/0x1c0 fs/btrfs/extent_io.c:340
>
> This is the one of the core systems btrfs uses, and we really don't want
> that to fail.
>
> Thus in fact it does some preallocation to prevent failure.
>
> But for error injection case, we can still hit BUG_ON() which is used to
> catch ENOMEM.
>

Hello,

Fuzzer triggered following crashes repeatedly when the `fault
injection` was enabled.

HEAD commit: 92477dd1faa6 Merge tag 's390-5.15-ebpf-jit-fixes'
git tree: upstream
kernel config: https://drive.google.com/file/d/1KgvcM8i_3hQiOL3fUh3JFpYNQM4itvV4/view?usp=sharing
[1] kernel BUG in btrfs_free_tree_block (fs/btrfs/extent-tree.c:3297):
https://paste.ubuntu.com/p/ZtzVKWbcGm/
[2] kernel BUG in clear_state_bit (fs/btrfs/extent_io.c:658!):
https://paste.ubuntu.com/p/hps2wXPG2b/
[3] kernel BUG in set_extent_bit (fs/btrfs/extent_io.c:1021):
https://paste.ubuntu.com/p/dcptjYYxgd/
[4] kernel BUG in set_state_bits (fs/btrfs/extent_io.c:939):
https://paste.ubuntu.com/p/NV9qtKB4KZ/

All the above crashes were triggered directly by the `BUG_ON()` macro
in the corresponding location.
Most `BUG_ON()` was hit due to `ENOMEM` when fault injected.
Would it be better for btrfs to handle the `ENOMEM` error, e.g.,
gracefully return, rather than panic the kernel?

Regards
Hao