[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMeeMh-6KMLgriX_7KT52ynjBMyT9yDWSMKv6YXW+yDpvv0=wA@mail.gmail.com>
Date: Tue, 11 Jun 2019 22:56:42 -0400
From: John Dorminy <jdorminy@...hat.com>
To: Mike Snitzer <snitzer@...hat.com>
Cc: Jens Axboe <axboe@...nel.dk>, NeilBrown <neilb@...e.com>,
linux-block@...r.kernel.org,
device-mapper development <dm-devel@...hat.com>,
Milan Broz <gmazyland@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: block: be more careful about status in __bio_chain_endio
Having studied the code in question more thoroughly, my first email's
scenario can't occur, I believe. bio_put() contains a
atomic_dec_and_test(), which (according to
Documentation/atomic_t.txt), having a return value, are fully ordered
and therefore impose a general memory barrier. A general memory
barrier means that no value set before bio_put() may be observed
afterwards, if I accurately understand
Documentation/memory_barriers.txt. Thus, the scenario I imagined, with
a load from inside __bio_chain_endio() being visible in bi_end_io(),
cannot occur.
However, the second email's scenario, of a compiler making two stores
out of a conditional store, is still accurate according to my
understanding. I continue to believe setting parent->bi_status needs
to be a WRITE_ONCE() and any reading of parent->bi_status before
bio_put() needs to be at least a READ_ONCE(). The last thing in that
email is wrong for the same reason that the first email couldn't
happen: the bio_put() general barrier means later accesses to the
field from a single thread will freshly read the field and thereby not
get an incorrectly cached value.
As a concrete proposal, I believe either of the following work and fix
the race NeilB described, as well as the compiler (or CPU) race I
described:
- if (!parent->bi_status)
- parent->bi_status = bio->bi_status;
+ if (bio->bi_status)
+ WRITE_ONCE(parent->bi_status, bio->bi_status);
--or--
- if (!parent->bi_status)
- parent->bi_status = bio->bi_status;
+ if (!READ_ONCE(parent->bi_status) && bio->bi_status)
+ WRITE_ONCE(parent->bi_status, bio->bi_status);
I believe the second of these might, but is not guaranteed to,
preserve the first error observed in a child; I believe if you want to
definitely save the first error you need an atomic.
Powered by blists - more mailing lists