[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bv4dfgmq4wmfcon2thkvhthqjlrpr5h4nmhfuusj4lh2wrj5ao@ckmvt63s67r6>
Date: Thu, 24 Oct 2024 17:36:47 -0400
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: John Stoffel <john@...ffel.org>
Cc: manas18244@...td.ac.in, linux-bcachefs@...r.kernel.org,
linux-kernel@...r.kernel.org, Anup Sharma <anupnewsmail@...il.com>,
Shuah Khan <skhan@...uxfoundation.org>, syzbot+e8eff054face85d7ea41@...kaller.appspotmail.com
Subject: Re: [PATCH] Revert "bcachefs: Add asserts to
bch2_dev_btree_bitmap_marked_sectors()"
On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote:
> >>>>> "Kent" == Kent Overstreet <kent.overstreet@...ux.dev> writes:
>
> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote:
> >> From: Manas <manas18244@...td.ac.in>
> >>
> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e.
> >>
> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in
> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is
> >> 63 '?'. This triggers both the assertions.
>
> > The BUG_ON() doesn't need to be deleted; we just need to fix the
> > validation so it doesn't fire (it doesn't particularly matter if it's
> > removed or not, ubsan would catch it without the BUG_ON()).
>
> Shouldn't the BUG_ON() be replaced with making the filesystem readonly
> instead if you're going to keep it? I'd rather have the filesystem
> still be mounted and able to be read, but not writeable, instead of
> having my system crash before I can do anything.
Not in this case. In general, if there's a chance of the BUG_ON()
hitting in the wild after the code has passed testing then it shouldn't
be a BUG_ON(), but this is low level validation that we're relying on.
In general higher level code absolutely requires that the low level
validation is correct, because if it's not that will trigger all sorts
of undefined behaviour in the upper layers.
By "low level validation" I mean _only_ the validate functions that
check simple invariants within a single data type that is read or
written atomically to disk - "is data type garbage or not".
If it's an invariant that involves multiple objects - "does this extent
refer to a valid device/snapshot ID" - that's not something we can check
in validate, because then an extent or what have you would become
invalid depending on what happens in the rest of the filesystem. Those
sorts of checks do in fact need proper error paths.
Powered by blists - more mailing lists