[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHex0corhi5He41qdenWovgkVyD5Rpj0YLxeN+-ex4cOycqKdQ@mail.gmail.com>
Date: Mon, 12 Sep 2016 14:38:03 -0400
From: Jonathan Nicklin <jnicklin@...ckbridge.com>
To: linux-kernel@...r.kernel.org
Subject: BUG: aio/direct-io data corruption in 4.7
In 4.7.2, the kernel is acknowledging block writes that have not
completed to disk. To reproduce: create an MD array, run FIO (direct +
libaio), and pull all drives. FIO will continue to run without
receiving I/O errors. I have also reproduced the bug using physical
drives. In this case, only a limited number of I/Os are incorrectly
acknowledged; FIO eventually receives an I/O error after the device
reference is removed.
The root cause of the problem is that dio_complete() does not
correctly propagate I/O errors in the is_async case. Specifically,
generic_write_sync() appears to be overwriting the return status
destined for ki_complete().
This bug appears to have been introduced by the following commit:
Description: "fs: simplify the generic_write_sync prototype"
Committed: Apr 7, 2016
Hash: e259221763a40403d5bb232209998e8c45804ab8
Affects: 4.7-rc1 - master
I have confirmed a fix for the AIO/Direct-IO failure condition but
have not reviewed the rest of the changes associated with that commit.
If you would like a small patch for direct-io.c, let me know.
Regards,
-Jonathan
Powered by blists - more mailing lists