lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aWiyx-tXTp81yfBx@moria.home.lan>
Date: Thu, 15 Jan 2026 04:28:49 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Stephen Zhang <starzhangzsd@...il.com>
Cc: colyli@...as.com, axboe@...nel.dk, sashal@...nel.org, 
	linux-bcache@...r.kernel.org, linux-kernel@...r.kernel.org, zhangshida@...inos.cn, 
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] bcache: fix double bio_endio completion in
 detached_dev_end_io

On Thu, Jan 15, 2026 at 05:17:39PM +0800, Stephen Zhang wrote:
> Kent Overstreet <kent.overstreet@...ux.dev> 于2026年1月15日周四 16:59写道:
> >
> > On Thu, Jan 15, 2026 at 04:06:53PM +0800, Stephen Zhang wrote:
> > > zhangshida <starzhangzsd@...il.com> 于2026年1月15日周四 15:48写道:
> > > >
> > > > From: Shida Zhang <zhangshida@...inos.cn>
> > > >
> > > > Commit 53280e398471 ("bcache: fix improper use of bi_end_io") attempted
> > > > to fix up bio completions by replacing manual bi_end_io calls with
> > > > bio_endio(). However, it introduced a double-completion bug in the
> > > > detached_dev path.
> > > >
> > > > In a normal completion path, the call stack is:
> > > >    blk_update_request
> > > >      bio_endio(bio)
> > > >        bio->bi_end_io(bio) -> detached_dev_end_io
> > > >          bio_endio(bio)    <- BUG: second call
> > > >
> > > > To fix this, detached_dev_end_io() must manually call the next completion
> > > > handler in the chain.
> > > >
> > > > However, in detached_dev_do_request(), if a discard is unsupported, the
> > > > bio is rejected before being submitted to the lower level. In this case,
> > > > we can use the standard bio_endio().
> > > >
> > > >    detached_dev_do_request
> > > >      bio_endio(bio)        <- Correct: starts completion for
> > > >                                 unsubmitted bio
> > > >
> > > > Fixes: 53280e398471 ("bcache: fix improper use of bi_end_io")
> > > > Signed-off-by: Shida Zhang <zhangshida@...inos.cn>
> > > > ---
> > > >  drivers/md/bcache/request.c | 11 +++++++++--
> > > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> > > > index 82fdea7dea7..ec712b5879f 100644
> > > > --- a/drivers/md/bcache/request.c
> > > > +++ b/drivers/md/bcache/request.c
> > > > @@ -1104,7 +1104,14 @@ static void detached_dev_end_io(struct bio *bio)
> > > >         }
> > > >
> > > >         kfree(ddip);
> > > > -       bio_endio(bio);
> > > > +       /*
> > > > +        * This is an exception where bio_endio() cannot be used.
> > > > +        * We are already called from within a bio_endio() stack;
> > > > +        * calling it again here would result in a double-completion
> > > > +        * (decrementing bi_remaining twice). We must call the
> > > > +        * original completion routine directly.
> > > > +        */
> > > > +       bio->bi_end_io(bio);
> > > >  }
> > > >
> > > >  static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> > > > @@ -1136,7 +1143,7 @@ static void detached_dev_do_request(struct bcache_device *d, struct bio *bio,
> > > >
> > > >         if ((bio_op(bio) == REQ_OP_DISCARD) &&
> > > >             !bdev_max_discard_sectors(dc->bdev))
> > > > -               detached_dev_end_io(bio);
> > > > +               bio_endio(bio);
> > > >         else
> > > >                 submit_bio_noacct(bio);
> > > >  }
> > > > --
> > > > 2.34.1
> > > >
> > >
> > > Hi,
> > >
> > > My apologies for the late reply due to a delay in checking my working inbox.
> > > I see the issue mentioned in:
> > > https://lore.kernel.org/all/aWU2mO5v6RezmIpZ@moria.home.lan/
> > > this was indeed an oversight on my part.
> > >
> > > To resolve this quickly, I've prepared a direct fix for the
> > > double-completion bug.
> > > I hope this is better than a full revert.
> >
> > In general, it's just safer, simpler and saner to revert, reverting a
> > patch is not something to be avoided. If there's _any_ new trickyness
> > required in the fix, it's better to just revert than rush things.
> >
> > I revert or kick patches out - including my own - all the time.
> >
> > That said, this patch is good, you've got a comment explaining what's
> > going on. Christoph's version of just always cloning the bio is
> > definitely cleaner, but that's a bigger change,
> 
> Thank you for the feedback.
> 
> I sincerely hope that Christoph's version can resolve this issue properly, and
> that it helps remedy the regression I introduced. I appreciate everyone's
> patience and the efforts to address this.
> 
> Let me know if there's anything further needed from my side.

Thanks for being attentive, no worries about any of it.

It looks like from your patch there was an actual bug you were trying to
fix - bio_endio() not being called at all in this case

> > > >         if ((bio_op(bio) == REQ_OP_DISCARD) &&
> > > >             !bdev_max_discard_sectors(dc->bdev))

That would have been good to highlight up front.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ