lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANubcdWBF5tCfrutAOiUkFaZb=9s4=bMKzi7dSwQxTGbC_3_1Q@mail.gmail.com>
Date: Wed, 3 Dec 2025 11:09:36 +0800
From: Stephen Zhang <starzhangzsd@...il.com>
To: Andreas Gruenbacher <agruenba@...hat.com>
Cc: Christoph Hellwig <hch@....de>, Johannes.Thumshirn@....com, ming.lei@...hat.com, 
	hsiangkao@...ux.alibaba.com, csander@...estorage.com, colyli@...as.com, 
	linux-block@...r.kernel.org, linux-bcache@...r.kernel.org, 
	linux-kernel@...r.kernel.org, zhangshida@...inos.cn
Subject: Re: [PATCH v4 3/3] block: prevent race condition on bi_status in __bio_chain_endio

Stephen Zhang <starzhangzsd@...il.com> 于2025年12月3日周三 09:51写道:
>
> Andreas Gruenbacher <agruenba@...hat.com> 于2025年12月3日周三 05:15写道:
> >
> > On Tue, Dec 2, 2025 at 6:48 AM Christoph Hellwig <hch@....de> wrote:
> > > On Mon, Dec 01, 2025 at 02:07:07PM +0100, Andreas Gruenbacher wrote:
> > > > On Mon, Dec 1, 2025 at 12:25 PM Christoph Hellwig <hch@...radead.org> wrote:
> > > > > On Mon, Dec 01, 2025 at 11:22:32AM +0100, Andreas Gruenbacher wrote:
> > > > > > > -       if (bio->bi_status && !parent->bi_status)
> > > > > > > -               parent->bi_status = bio->bi_status;
> > > > > > > +       if (bio->bi_status)
> > > > > > > +               cmpxchg(&parent->bi_status, 0, bio->bi_status);
> > > > > >
> > > > > > Hmm. I don't think cmpxchg() actually is of any value here: for all
> > > > > > the chained bios, bi_status is initialized to 0, and it is only set
> > > > > > again (to a non-0 value) when a failure occurs. When there are
> > > > > > multiple failures, we only need to make sure that one of those
> > > > > > failures is eventually reported, but for that, a simple assignment is
> > > > > > enough here.
> > > > >
> > > > > A simple assignment doesn't guarantee atomicy.
> > > >
> > > > Well, we've already discussed that bi_status is a single byte and so
> > > > tearing won't be an issue. Otherwise, WRITE_ONCE() would still be
> > > > enough here.
> > >
> > > No.  At least older alpha can tear byte updates as they need a
> > > read-modify-write cycle.
> >
> > I know this used to be a thing in the past, but to see that none of
> > that is relevant anymore today, have a look at where [*] quotes the
> > C11 standard:
> >
> >         memory location
> >                 either an object of scalar type, or a maximal sequence
> >                 of adjacent bit-fields all having nonzero width
> >
> >                 NOTE 1: Two threads of execution can update and access
> >                 separate memory locations without interfering with
> >                 each other.
> >
> >                 NOTE 2: A bit-field and an adjacent non-bit-field member
> >                 are in separate memory locations. The same applies
> >                 to two bit-fields, if one is declared inside a nested
> >                 structure declaration and the other is not, or if the two
> >                 are separated by a zero-length bit-field declaration,
> >                 or if they are separated by a non-bit-field member
> >                 declaration. It is not safe to concurrently update two
> >                 bit-fields in the same structure if all members declared
> >                 between them are also bit-fields, no matter what the
> >                 sizes of those intervening bit-fields happen to be.
> >
> > [*] Documentation/memory-barriers.txt
> >
> > > But even on normal x86 the check and the update would be racy.
> >
> > There is no check and update (RMW), though. Quoting what I wrote
> > earlier in this thread:
> >
> > On Mon, Dec 1, 2025 at 11:22 AM Andreas Gruenbacher <agruenba@...hat.com> wrote:
> > > Hmm. I don't think cmpxchg() actually is of any value here: for all
> > > the chained bios, bi_status is initialized to 0, and it is only set
> > > again (to a non-0 value) when a failure occurs. When there are
> > > multiple failures, we only need to make sure that one of those
> > > failures is eventually reported, but for that, a simple assignment is
> > > enough here. The cmpxchg() won't guarantee that a specific error value
> > > will survive; it all still depends on the timing. The cmpxchg() only
> > > makes it look like something special is happening here with respect to
> > > ordering.
> >
> > So with or without the cmpxchg(), if there are multiple errors, we
> > won't know which bi_status code will survive, but we do know that we
> > will end up with one of those error codes.
> >
>
> Thank you for sharing your insights—I found the discussion very enlightening.
>
> While I agree with Andreas’s perspective, I also very much appreciate
> the clarity
> and precision offered by the cmpxchg() approach. That’s why when Christoph
> suggested it, I was happy to incorporate it into the code.
>
> But a cmpxchg is a little bit redundant here.
> so we will change it to the simple assignment:
>
> -       if (bio->bi_status && !parent->bi_status)
>                  parent->bi_status = bio->bi_status;
> +       if (bio->bi_status)
>                  parent->bi_status = bio->bi_status;
>
> I will integrate this discussion into the commit message, it is very insightful.
>

Hi,

I’ve been reconsidering the two approaches for the upcoming patch revision.
Essentially, we’re comparing two methods:
A:
        if (bio->bi_status)
                   parent->bi_status = bio->bi_status;
B:
        if (bio->bi_status)
                cmpxchg(&parent->bi_status, 0, bio->bi_status);

Both appear correct, but B seems a little bit redundant here.
Upon further reflection, I’ve noticed a subtle difference:
A unconditionally writes to parent->bi_status when bio->bi_status is non-zero,
regardless of the current value of parent->bi_status.
B uses cmpxchg to only update parent->bi_status if it is still zero.

Thus, B could avoid unnecessary writes in cases where parent->bi_status has
already been set to a non-zero value.

Do you think this optimization would be beneficial in practice, or is
the difference
negligible?

Thanks,
Shida

> Thanks,
> Shida
>
> > Andreas
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ