linux-kernel - Re: [PATCH] Revert "bcachefs: Add asserts to bch2_dev_btree_bitmap_marked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <26396.3323.858567.266417@quad.stoffel.home>
Date: Fri, 25 Oct 2024 17:26:19 -0400
From: "John Stoffel" <john@...ffel.org>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: John Stoffel <john@...ffel.org>,
    manas18244@...td.ac.in,
    linux-bcachefs@...r.kernel.org,
    linux-kernel@...r.kernel.org,
    Anup Sharma <anupnewsmail@...il.com>,
    Shuah Khan <skhan@...uxfoundation.org>,
    syzbot+e8eff054face85d7ea41@...kaller.appspotmail.com
Subject: Re: [PATCH] Revert "bcachefs: Add asserts to
 bch2_dev_btree_bitmap_marked_sectors()"

>>>>> "Kent" == Kent Overstreet <kent.overstreet@...ux.dev> writes:

> On Thu, Oct 24, 2024 at 02:30:34PM -0400, John Stoffel wrote:
>> >>>>> "Kent" == Kent Overstreet <kent.overstreet@...ux.dev> writes:
>> 
>> > On Mon, Oct 21, 2024 at 10:18:57PM +0530, Manas via B4 Relay wrote:
>> >> From: Manas <manas18244@...td.ac.in>
>> >> 
>> >> This reverts commit 60f2b1bcf519416dbffee219132aa949d0c39d0e.
>> >> 
>> >> This syzbot bug[1] is triggered due to the BUG_ON assertions added in
>> >> __bch2_dev_btree_bitmap_mark. During runtime, m->btree_bitmap_shift is
>> >> 63 '?'. This triggers both the assertions.
>> 
>> > The BUG_ON() doesn't need to be deleted; we just need to fix the
>> > validation so it doesn't fire (it doesn't particularly matter if it's
>> > removed or not, ubsan would catch it without the BUG_ON()).
>> 
>> Shouldn't the BUG_ON() be replaced with making the filesystem readonly
>> instead if you're going to keep it?  I'd rather have the filesystem
>> still be mounted and able to be read, but not writeable, instead of
>> having my system crash before I can do anything.  

> Not in this case. In general, if there's a chance of the BUG_ON()
> hitting in the wild after the code has passed testing then it
> shouldn't be a BUG_ON(), but this is low level validation that we're
> relying on.

So I'm having a hard time parsing this reply.  And I don't think you
make a good case for leaving or even having a BUG_ON() here at all.
If there's a chance of it hitting in the wild, it should be removed.
But you don't want to remove it because it will never hit?  That's
just lazy... :-)   I just don't see why a filesystem should have the
opportunity to kill an entire system because that one filesystem has
problems.  

> In general higher level code absolutely requires that the low level
> validation is correct, because if it's not that will trigger all sorts
> of undefined behaviour in the upper layers.

> By "low level validation" I mean _only_ the validate functions that
> check simple invariants within a single data type that is read or
> written atomically to disk - "is data type garbage or not".

Sure.

> If it's an invariant that involves multiple objects - "does this
> extent refer to a valid device/snapshot ID" - that's not something
> we can check in validate, because then an extent or what have you
> would become invalid depending on what happens in the rest of the
> filesystem. Those sorts of checks do in fact need proper error
> paths.

Right.