lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <alpine.LFD.0.999.0710261425240.30120@woody.linux-foundation.org> Date: Fri, 26 Oct 2007 14:34:24 -0700 (PDT) From: Linus Torvalds <torvalds@...ux-foundation.org> To: Karl Schendel <kschendel@...allegro.com>, Zach Brown <zach.brown@...cle.com>, Benjamin LaHaise <bcrl@...ck.org>, Andrew Morton <akpm@...ux-foundation.org> cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Nick Piggin <nickpiggin@...oo.com.au>, Leonid Ananiev <leonid.i.ananiev@...ux.intel.com> Subject: Re: [PATCH] Fix bad data from non-direct-io read after direct-io write Hmm. If I read this right, this bug seems to have been introduced by commit 65b8291c4000e5f38fc94fb2ca0cb7e8683c8a1b ("dio: invalidate clean pages before dio write") back in March. Before that, we'd call invalidate_inode_pages2_range() unconditionally after the call mapping->a_ops->direct_IO() if it was a write and there were cached pages in the mapping (well, "unconditionally" in the sense that it didn't depend on the return value of the ->direct_IO() call). However, with both the old and the new code _and_ with your patch, the return code - in case the invalidate failed - was corrupted. So we may actually end up doing some IO, but then returning the "wrong" error code from the invalidate. Hmm? Somebody who cares about direct-IO and who - unlike me - doesn't think it's a total and idiotic crock should think hard about this. I'm including Karl's email, but also an alternate patch for consideration. And maybe some day we can all agree that direct_IO is crap and should not be done. Linus -- diff --git a/mm/filemap.c b/mm/filemap.c index 5209e47..032371a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2510,22 +2510,17 @@ generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov, } retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs); - if (retval) + if (retval < 0) goto out; /* * Finally, try again to invalidate clean pages which might have been * faulted in by get_user_pages() if the source of the write was an * mmap()ed region of the file we're writing. That's a pretty crazy - * thing to do, so we don't support it 100%. If this invalidation - * fails and we have -EIOCBQUEUED we ignore the failure. + * thing to do, so we don't support it 100%. */ - if (rw == WRITE && mapping->nrpages) { - int err = invalidate_inode_pages2_range(mapping, - offset >> PAGE_CACHE_SHIFT, end); - if (err && retval >= 0) - retval = err; - } + if (rw == WRITE && mapping->nrpages) + invalidate_inode_pages2_range(mapping, offset >> PAGE_CACHE_SHIFT, end); out: return retval; } On Fri, 26 Oct 2007, Karl Schendel wrote: > > This patch fixes a race between direct IO writes and non-direct IO > reads on the same file. The symptom is a stale file page seen by > any non-direct-IO reader, which persists until the page is invalidated > somehow (e.g. page rewritten again, or memory pressure, or reboot). > > An improper return test caused direct-IO's after-write page invalidations > to be skipped. If we're writing page N, and the reader is reading > page N-x for small x, and the read code decides to readahead, it's > not too hard to cause a race that leaves an old, stale copy of the > page in the page cache. Retval is usually +nonzero after the > mapping->a_ops->direct_IO call! > > Signed-off-by: Karl Schendel <kschendel@...allegro.com> > > --- > > By the way, I agree that the userland situation is stupid, and I'm > addressing that in the application (happens to be the Ingres DBMS). > However, the kernel shouldn't compound the stupidity. > > I'll try to watch for replies, but it would be very useful to > cc me at kschendel@...allegro.com if any discussion is needed; > I'm not subscribed to lkml. > > > --- linux-2.6.23.1-base/mm/filemap.c 2007-10-12 12:43:44.000000000 -0400 > +++ linux-2.6.23.1/mm/filemap.c 2007-10-26 16:12:08.000000000 -0400 > @@ -2194,7 +2194,7 @@ generic_file_direct_IO(int rw, struct ki > } > > retval = mapping->a_ops->direct_IO(rw, iocb, iov, offset, nr_segs); > - if (retval) > + if (retval < 0) > goto out; > > /* > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists