[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47226A75.1020008@oracle.com>
Date: Fri, 26 Oct 2007 15:30:13 -0700
From: Zach Brown <zach.brown@...cle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Karl Schendel <kschendel@...allegro.com>,
Zach Brown <zach.brown@...cle.com>,
Benjamin LaHaise <bcrl@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Piggin <nickpiggin@...oo.com.au>,
Leonid Ananiev <leonid.i.ananiev@...ux.intel.com>,
Chris Mason <chris.mason@...cle.com>
Subject: Re: [PATCH] Fix bad data from non-direct-io read after direct-io
write
Linus Torvalds wrote:
> Hmm. If I read this right, this bug seems to have been introduced by
> commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate clean
> pages before dio write") back in March.
Agreed. And it's a really dumb bug. ->direct_io will almost always
return -EIOCBQUEUED for aio dio so it won't be invalidating for aio dio
writes. (Notice that the testing in that commit mentions two racing
processes, I bet U$1M that I only tested sync dio :/)
I think that test should be changed to
if (retval < 0 && retval != -EIOCBQUEUED)
goto out;
> However, with both the old and the new code _and_ with your patch, the
> return code - in case the invalidate failed - was corrupted. So we may
> actually end up doing some IO, but then returning the "wrong" error code
> from the invalidate. Hmm?
If the invalidation fails then the app is left with stale data in the
page cache and current data on disk. The return code corruption you're
referring to was intended to communicate this scary situation to the app
with EIO.
It sucks. Does it suck more than returning success for the dio write
when later buffered reads will return stale data? I dunno. What does
the peanut gallery think?
> And maybe some day we can all agree that direct_IO is crap and should not
> be done.
Chris (Mason) and I certainly love the idea of getting rid of
fs/direct-io.c. Getting O_DIRECT working with the page-granular
buffered locking rules while doing large IOs (and, as far as I know,
potentially sector-granular) without noticeable performance regressions
is a mess.
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists