[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mv0aprd8.fsf@notabene.neil.brown.name>
Date: Thu, 15 Feb 2018 19:52:03 +1100
From: NeilBrown <neilb@...e.com>
To: Milan Broz <gmazyland@...il.com>,
Mike Snitzer <snitzer@...hat.com>,
Mikulas Patocka <mpatocka@...hat.com>
Cc: device-mapper development <dm-devel@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [dm-devel] DM Regression in 4.16-rc1 - read() returns data when it shouldn't
On Thu, Feb 15 2018, Milan Broz wrote:
> On 02/15/2018 01:07 AM, NeilBrown wrote:
>>
>> And looking again at the code, it doesn't seem so bad. I was worried
>> about reassigning ci.io->orig_bio after calling
>> __split_and_process_non_flush(), but the important thing is to set it
>> before the last dec_pending() call, and the code gets that right.
>>
>> I think I've found the problem though:
>>
>> /* done with normal IO or empty flush */
>> bio->bi_status = io_error;
>>
>> When we have chained bios, there are multiple sources for error which
>> need to be merged into bi_status: all bios that were chained to this
>> one, and this bio itself.
>>
>> __bio_chain_endio uses:
>>
>> if (!parent->bi_status)
>> parent->bi_status = bio->bi_status;
>>
>> so the new status won't over-ride the old status (unless there are
>> races....), but dm doesn't do that - I guess it didn't have to worry
>> about chains before.
>>
>> So this:
>>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index d6de00f367ef..68136806d365 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -903,7 +903,8 @@ static void dec_pending(struct dm_io *io, blk_status_t error)
>> queue_io(md, bio);
>> } else {
>> /* done with normal IO or empty flush */
>> - bio->bi_status = io_error;
>> + if (io_error)
>> + bio->bi_status = io_error;
>> bio_endio(bio);
>> }
>> }
>>
>
> Hi,
>
> well, it indeed fixes the reported problem, thanks.
> But I just tested the first (dm.c) change above.
>
> The important part is also that if the error is hits bio later, it will produce
> short read (and not fail of the whole operation), this can be easily tested if we switch
> the error and the zero target in that reproducer
> //system("echo -e \'0 10000000 error\'\\\\n\'10000000 1000000 zero\' | dmsetup create test");
> system("echo -e \'0 10000000 zero\'\\\\n\'10000000 1000000 error\' | dmsetup create test");
>
> So it seems your patch works.
Excellent - thanks.
>
> I am not sure about this part though:
>
>> should fix it. And __bio_chain_endio should probably be
>>
>> diff --git a/block/bio.c b/block/bio.c
>> index e1708db48258..ad77140edc6f 100644
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -312,7 +312,7 @@ static struct bio *__bio_chain_endio(struct bio *bio)
>> {
>> struct bio *parent = bio->bi_private;
>>
>> - if (!parent->bi_status)
>> + if (bio->bi_status)
>> parent->bi_status = bio->bi_status;
>
> Shouldn't it be
> if (!parent->bi_status && bio->bi_status)
> parent->bi_status = bio->bi_status;
> ?
That wouldn't hurt, but I don't see how it would help.
It would still be racy as two chained bios could complete at
the same time, so ->bi_status much change between test and set.
>
> Maybe one rephrased question: what about priorities for errors (bio statuses)?
>
> For example, part of a bio fails with EILSEQ (data integrity check error, BLK_STS_PROTECTION),
> part with EIO (media error, BLK_STS_IOERR). What should be reported for parent bio?
> (I would expect I always get EIO here because this is hard, uncorrectable error.)
If there was a well defined ordering, then we could easily write to code
ensure the highest priority status wins, but I don't think there is any
such ordering.
Certainly dec_pending() in dm.c already faces the possibility that it will
be called multiple times with different errors, but doesn't consider any
ordering (except relating to BLK_STS_DM_REQUEUE), when assigning a
status code to io->status.
>
> Anyway, thanks for the fix!
And thanks for the report.
NeilBrown
>
> Milan
>
>
>> bio_put(bio);
>> return parent;
>>
>> to remove the chance that a race will hide a real error.
>>
>> Milan: could you please check that the patch fixes the dm-crypt bug?
>> I've confirmed that it fixes your test case.
>> When you confirm, I'll send a proper patch.
>>
>> I guess I should audit all code that assigns to ->bi_status and make
>> sure nothing ever assigns 0 to it.
>>
>> Thanks,
>> NeilBrown
>>
>
> --
> dm-devel mailing list
> dm-devel@...hat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)
Powered by blists - more mailing lists