[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26396.3323.858567.266417@quad.stoffel.home>
Date: Fri, 25 Oct 2024 17:26:19 -0400
From: "John Stoffel" <john@...ffel.org>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: John Stoffel <john@...ffel.org>,
manas18244@...td.ac.in,
linux-bcachefs@...r.kernel.org,
linux-kernel@...r.kernel.org,
Anup Sharma <anupnewsmail@...il.com>,
Shuah Khan <skhan@...uxfoundation.org>,
syzbot+e8eff054face85d7ea41@...kaller.appspotmail.com
Subject: Re: [PATCH] Revert "bcachefs: Add asserts to
bch2_dev_btree_bitmap_marked_sectors()"
>>>>> "Kent" == Kent Overstreet <kent.overstreet@...ux.dev> writes:
> On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote:
>> >>>>> "Kent" == Kent Overstreet <kent.overstreet@...ux.dev> writes:
>>
>> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote:
>> >> From: Manas <manas18244@...td.ac.in>
>> >>
>> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e.
>> >>
>> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in
>> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is
>> >> 63 '?'. This triggers both the assertions.
>>
>> > The BUG_ON() doesn't need to be deleted; we just need to fix the
>> > validation so it doesn't fire (it doesn't particularly matter if it's
>> > removed or not, ubsan would catch it without the BUG_ON()).
>>
>> Shouldn't the BUG_ON() be replaced with making the filesystem readonly
>> instead if you're going to keep it? I'd rather have the filesystem
>> still be mounted and able to be read, but not writeable, instead of
>> having my system crash before I can do anything.
> Not in this case. In general, if there's a chance of the BUG_ON()
> hitting in the wild after the code has passed testing then it
> shouldn't be a BUG_ON(), but this is low level validation that we're
> relying on.
So I'm having a hard time parsing this reply. And I don't think you
make a good case for leaving or even having a BUG_ON() here at all.
If there's a chance of it hitting in the wild, it should be removed.
But you don't want to remove it because it will never hit? That's
just lazy... :-) I just don't see why a filesystem should have the
opportunity to kill an entire system because that one filesystem has
problems.
> In general higher level code absolutely requires that the low level
> validation is correct, because if it's not that will trigger all sorts
> of undefined behaviour in the upper layers.
> By "low level validation" I mean _only_ the validate functions that
> check simple invariants within a single data type that is read or
> written atomically to disk - "is data type garbage or not".
Sure.
> If it's an invariant that involves multiple objects - "does this
> extent refer to a valid device/snapshot ID" - that's not something
> we can check in validate, because then an extent or what have you
> would become invalid depending on what happens in the rest of the
> filesystem. Those sorts of checks do in fact need proper error
> paths.
Right.
Powered by blists - more mailing lists