linux-kernel - Re: Fix potential data loss and corruption due to Incorrect BIO Chain Handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANubcdUG_3VwagV-cSfhp4+95Dj_e-wkxegzCdmuNieWqrehug@mail.gmail.com>
Date: Mon, 24 Nov 2025 10:00:42 +0800
From: Stephen Zhang <starzhangzsd@...il.com>
To: Ming Lei <ming.lei@...hat.com>
Cc: Andreas Gruenbacher <agruenba@...hat.com>, linux-kernel@...r.kernel.org, 
	linux-block@...r.kernel.org, nvdimm@...ts.linux.dev, 
	virtualization@...ts.linux.dev, linux-nvme@...ts.infradead.org, 
	gfs2@...ts.linux.dev, ntfs3@...ts.linux.dev, linux-xfs@...r.kernel.org, 
	zhangshida@...inos.cn, Coly Li <colyli@...as.com>, linux-bcache@...r.kernel.org
Subject: Re: Fix potential data loss and corruption due to Incorrect BIO Chain Handling

Stephen Zhang <starzhangzsd@...il.com> 于2025年11月24日周一 09:28写道：
>
> Ming Lei <ming.lei@...hat.com> 于2025年11月23日周日 21:49写道：
> >
> > On Sat, Nov 22, 2025 at 03:56:58PM +0100, Andreas Gruenbacher wrote:
> > > On Sat, Nov 22, 2025 at 1:07 PM Ming Lei <ming.lei@...hat.com> wrote:
> > > > > static void bio_chain_endio(struct bio *bio)
> > > > > {
> > > > >         bio_endio(__bio_chain_endio(bio));
> > > > > }
> > > >
> > > > bio_chain_endio() never gets called really, which can be thought as `flag`,
> > >
> > > That's probably where this stops being relevant for the problem
> > > reported by Stephen Zhang.
> > >
> > > > and it should have been defined as `WARN_ON_ONCE(1);` for not confusing people.
> > >
> > > But shouldn't bio_chain_endio() still be fixed to do the right thing
> > > if called directly, or alternatively, just BUG()? Warning and still
> > > doing the wrong thing seems a bit bizarre.
> >
> > IMO calling ->bi_end_io() directly shouldn't be encouraged.
> >
> > The only in-tree direct call user could be bcache, so is this reported
> > issue triggered on bcache?
> >

I need to confirm the details later. However, let's assume our analysis provides
a theoretical model that explains all the observed phenomena without any
inconsistencies. Furthermore, we have a real-world problem that exhibits all
these same phenomena exactly.

In such a scenario, the chances that our analysis is incorrect are very low.

Even if bcache is not part of the running configuration, our later invetigation
will revolve around that analysis.

Therefore, what I want to explore further is: does this analysis can
really hold up
and perfectly explain everything without inconsistencies, assuming we can
introduce as much complex runtime configuration as possible?

Thanks,
Shida

> > If bcache can't call bio_endio(), I think it is fine to fix
> > bio_chain_endio().
> >
> > >
> > > I also see direct bi_end_io calls in erofs_fileio_ki_complete(),
> > > erofs_fscache_bio_endio(), and erofs_fscache_submit_bio(), so those
> > > are at least confusing.
> >
> > All looks FS bio(non-chained), so bio_chain_endio() shouldn't be involved
> > in erofs code base.
> >
>
> Okay, will add that.
>
> Thanks,
> Shida
>
> >
> > Thanks,
> > Ming
> >