lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHc6FU5KQBpJOWcx0uiE1U5vJoON147wFMUn0oWzUmzSQajirw@mail.gmail.com>
Date: Thu, 4 Dec 2025 13:01:21 +0100
From: Andreas Gruenbacher <agruenba@...hat.com>
To: zhangshida <starzhangzsd@...il.com>
Cc: Johannes.Thumshirn@....com, hch@...radead.org, ming.lei@...hat.com, 
	hsiangkao@...ux.alibaba.com, csander@...estorage.com, colyli@...as.com, 
	linux-block@...r.kernel.org, linux-bcache@...r.kernel.org, 
	linux-kernel@...r.kernel.org, zhangshida@...inos.cn, 
	Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v5 3/3] block: prevent race condition on bi_status in __bio_chain_endio

On Thu, Dec 4, 2025 at 3:48 AM zhangshida <starzhangzsd@...il.com> wrote:
> From: Shida Zhang <zhangshida@...inos.cn>
>
> Andreas point out that multiple completions can race setting
> bi_status.

What I've actually  pointed out is that the '!parent->bi_status' check
in this statement is an unnecessary optimization that can be removed.
But this is not what this discussion is mainly about anymore.

In the current code, multiple completions can race setting bi_status,
but that is fine as long as bi_status is never set to 0 during bio
completion. The effect is that when there are multiple errors, the
bi_error field of the final bio will eventually be set to an error
code, but we don't know which error code will win. This all works
correctly today, and there is no race to fix because the race is
intentional.

> If __bio_chain_endio() is called concurrently from multiple threads
> accessing the same parent bio, it should use WRITE_ONCE()/READ_ONCE()
> to access parent->bi_status and avoid data races.
>
> On x86 and ARM, these macros compile to the same instruction as a
> normal write, but they may be required on other architectures to
> prevent tearing, and to ensure the compiler does not add or remove
> memory accesses under the assumption that the values are not accessed
> concurrently.

WRITE_ONCE() and READ_ONCE() also prevent the compiler from reordering
operations. Even when the compiler doesn't seem to do anything nasty
at the moment, it would probably still be worthwhile to use
WRITE_ONCE() for setting bi_status throughout the code. But that's
beyond the scope of this patch, and it calls for more than a global
search and replace job.

> Adopting a cmpxchg approach, as used in other code paths, resolves all
> these issues, as suggested by Christoph.

No, the cmpxchg() doesn't actually achieve anything, it only makes
things worse. For example, when there is an A -> B chain, we can end
up with the following sequence of events:

  - A fails, sets A->bi_status, and calls bio_endio(A).
  - B->status is still 0, so bio_endio(A) sets B->bi_status to A->bi_status.
  - B fails and sets B->bi_status, OVERRIDING the value of A->bi_status.
  - bio_endio(B) calls B->bi_end_io().

Things get worse in an A -> B -> C chain, but I've already mentioned
that earlier in this thread.

So again, the cmpxchg() is unnecessary, but it is also harmless
because it suggests that there is some form of synchronization that
doesn't exist. The btrfs code that the cmpxchg() was taken from seems
to implement actual first-failure-wins semantics, but this patch does
not.

The underlying question here is whether we want to change things so
that bi_status is set to the first error that occurs (probably first
in time, not first in the chain). If that is the goal, then we should
be explicit about it. Right now, I don't see the need.

Thanks,
Andreas

> Suggested-by: Andreas Gruenbacher <agruenba@...hat.com>
> Suggested-by: Christoph Hellwig <hch@...radead.org>
> Suggested-by: Caleb Sander Mateos <csander@...estorage.com>
> Reviewed-by: Christoph Hellwig <hch@....de>
> Signed-off-by: Shida Zhang <zhangshida@...inos.cn>
> ---
>  block/bio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index cfb751dfcf5..51b57f9d8bd 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -314,8 +314,9 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>  {
>         struct bio *parent = bio->bi_private;
>
> -       if (bio->bi_status && !parent->bi_status)
> -               parent->bi_status = bio->bi_status;
> +       if (bio->bi_status)
> +               cmpxchg(&parent->bi_status, 0, bio->bi_status);
> +
>         bio_put(bio);
>         return parent;
>  }
> --
> 2.34.1
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ