linux-kernel - Re: [syzbot] [bcachefs?] KMSAN: uninit-value in bch2_bkey_cmp_packed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAG_fn=XSQvdeFXCx2rsgdoUCyDV8t4LJBuRA8nKHKdCHrcWBYw@mail.gmail.com>
Date: Tue, 17 Sep 2024 12:09:19 +0200
From: Alexander Potapenko <glider@...gle.com>
To: Aleksandr Nogikh <nogikh@...gle.com>
Cc: Piotr Zalewski <pZ010001011111@...ton.me>, 
	syzbot <syzbot+6f655a60d3244d0c6718@...kaller.appspotmail.com>, 
	linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bcachefs?] KMSAN: uninit-value in bch2_bkey_cmp_packed_inlined

On Tue, Sep 17, 2024 at 8:27 AM Aleksandr Nogikh <nogikh@...gle.com> wrote:
>
> +Alexander Potapenko
>
>
> On Tue, Sep 17, 2024 at 8:26 AM 'Piotr Zalewski' via syzkaller-bugs
> <syzkaller-bugs@...glegroups.com> wrote:
> >
> > Hello,
> >
> > On Saturday, September 14th, 2024 at 2:15 PM, syzbot <syzbot+6f655a60d3244d0c6718@...kaller.appspotmail.com> wrote:
> >
> > > Hello,
> > >
> > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > kernel panic: corrupted stack end in x64_sys_call
> > >
> > > bucket 0:127 gen 0 has wrong data_type: got free, should be sb, fixing
> > > bucket 0:127 gen 0 data type sb has wrong dirty_sectors: got 0, should be 256, fixing
> > > done
> > > bcachefs (loop0): going read-write
> > > bcachefs (loop0): journal_replay...
> > > Kernel panic - not syncing: corrupted stack end detected inside scheduler
> > > CPU: 0 UID: 0 PID: 5945 Comm: syz.0.15 Not tainted 6.11.0-rc7-syzkaller-g57719771a244-dirty #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
> > > Call Trace:
> > > <TASK>
> > >
> > > __dump_stack lib/dump_stack.c:93 [inline]
> > > dump_stack_lvl+0x216/0x2d0 lib/dump_stack.c:119
> > > dump_stack+0x1e/0x30 lib/dump_stack.c:128
> > > panic+0x4e2/0xcd0 kernel/panic.c:354
> > > schedule_debug kernel/sched/core.c:5745 [inline]
> >
> > The place where kernel task's stack magic number gets
> > smashed was found. Backtrace was presented below. Seems
> > like it is KMSAN's fault. Is this considered a bug?

Interesting, 18446744071600444244 is 0xffffffff82499354, which is the
get_shadow_origin_ptr() return address.
So we're indeed seeing a stack overflow in the instrumentation code.

Looking at vmlinux-b7718454 from
https://storage.googleapis.com/syzbot-assets/094db88ff1c2/vmlinux-b7718454.xz
(I am assuming it was used to test this patch), I see that a number
functions from the report have quite big stack frames:

symbol_string
ffffffff8fc9b801: 48 81 ec 00 03 00 00 sub    $0x300,%rsp
bch2_path_get
ffffffff853a7ad1: 48 81 ec 60 01 00 00 sub    $0x160,%rsp
bch2_btree_path_traverse_one
ffffffff85399741: 48 81 ec 70 02 00 00 sub    $0x270,%rsp
bch2_bucket_alloc_set_trans
ffffffff852bf441: 48 81 ec 98 03 00 00 sub    $0x398,%rsp
__open_bucket_add_buckets
ffffffff852d128d: 48 81 ec 70 02 00 00 sub    $0x270,%rsp
bch2_alloc_sectors_start_trans
ffffffff852c25d1: 48 81 ec b0 01 00 00 sub    $0x1b0,%rsp
bch2_btree_update_start
ffffffff85456c1d: 48 81 ec 20 01 00 00 sub    $0x120,%rsp
__bch2_trans_commit
ffffffff85424541: 48 81 ec a0 01 00 00 sub    $0x1a0,%rsp
btree_write_buffer_flush_seq
ffffffff8548dd6d: 48 81 ec 10 02 00 00 sub    $0x210,%rsp
journal_flush_pins
ffffffff856f6bad: 48 81 ec 38 01 00 00 sub    $0x138,%rsp
bch2_fs_recovery
ffffffff8575cff1: 48 81 ec 78 01 00 00 sub    $0x178,%rsp
bch2_fs_get_tree
ffffffff855ac5c1: 48 81 ec e8 01 00 00 sub    $0x1e8,%rsp

KASAN creates even bigger frames for these functions, but that's
because of redzones added around local variables.
For KASAN we increase the default kernel stack sizes to account for
that, but we do not for KMSAN, because its effect on stack frame sizes
was usually moderate.
But looking at the same stack sizes for a binary with CONFIG_KMSAN=n
now, I'm seeing much lower values for some of them:

symbol_string
ffffffff8fd6d6e1: 48 81 ec 10 03 00 00 sub    $0x310,%rsp
bch2_path_get
ffffffff853ec4e1: 48 81 ec 68 01 00 00 sub    $0x168,%rsp
bch2_btree_path_traverse_one
ffffffff853de5a1: 48 81 ec 58 02 00 00 sub    $0x258,%rsp
bch2_bucket_alloc_set_trans
ffffffff853051d1: 48 81 ec b0 03 00 00 sub    $0x3b0,%rsp
__open_bucket_add_buckets
ffffffff8531759d: 48 81 ec 68 02 00 00 sub    $0x268,%rsp
bch2_alloc_sectors_start_trans
ffffffff85308de1: 48 81 ec b8 01 00 00 sub    $0x1b8,%rsp
bch2_btree_update_start
ffffffff8549a63d: 48 81 ec 20 01 00 00 sub    $0x120,%rsp
__bch2_trans_commit
ffffffff85468b51: 48 81 ec 80 01 00 00 sub    $0x180,%rsp
btree_write_buffer_flush_seq
ffffffff854d17fd: 48 81 ec 10 02 00 00 sub    $0x210,%rsp
journal_flush_pins
ffffffff8573c36d: 48 81 ec 30 01 00 00 sub    $0x130,%rsp
bch2_fs_recovery
ffffffff857a27d1: 48 81 ec 68 01 00 00 sub    $0x168,%rsp
bch2_fs_get_tree
ffffffff855f04d1: 48 81 ec e8 01 00 00 sub    $0x1e8,%rsp

I'll probably need to recalculate the overall stack bloat for KMSAN
builds and land something along the lines of
https://github.com/google/kmsan/commit/060de96aa5de0a95b42589920b64e9aa95af2151,
if needed.