lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1168544965.6170.76.camel@lade.trondhjem.org>
Date:	Thu, 11 Jan 2007 14:49:25 -0500
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	Linus Torvalds <torvalds@...l.org>
Cc:	Xavier Bestel <xavier.bestel@...e.fr>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Aubrey <aubreylee@...il.com>, Hua Zhong <hzhong@...il.com>,
	Hugh Dickins <hugh@...itas.com>, linux-kernel@...r.kernel.org,
	hch@...radead.org, kenneth.w.chen@...el.com, akpm@...l.org,
	mjt@....msk.ru
Subject: Re: O_DIRECT question

On Thu, 2007-01-11 at 11:00 -0800, Linus Torvalds wrote:
> 
> On Thu, 11 Jan 2007, Trond Myklebust wrote:
> > 
> > For NFS, the main feature of interest when it comes to O_DIRECT is
> > strictly uncached I/O. Replacing it with POSIX_FADV_NOREUSE won't help
> > because it can't guarantee that the page will be thrown out of the page
> > cache before some second process tries to read it. That is particularly
> > true if some dopey third party process has mmapped the file.
> 
> You'd still be MUCH better off using the page cache, and just forcing the 
> IO (but _with_ all the page cache synchronization still active). Which is 
> trivial to do on the filesystem level, especially for something like NFS.
> 
> If you bypass the page cache, you just make that "dopey third party 
> process" problem worse. You now _guarantee_ that there are aliases with 
> different data.

Quite, but that is sometimes an admissible state of affairs.

One of the things that was infuriating when we were trying to do shared
databases over the page cache was that someone would start some
unsynchronised process that had nothing to do with the database itself
(it would typically be a process that was backing up the rest of the
disk or something like that). Said process would end up pinning pages in
memory, and prevented the database itself from getting updated data from
the server.

IOW: the problem was not that of unsynchronised I/O per se. It was
rather that of allowing the application to set up its own
synchronisation barriers and to ensure that no pages are cached across
these barriers. POSIX_FADV_NOREUSE can't offer that guarantee.

> Of course, with NFS, the _server_ will resolve any aliases anyway, so at 
> least you don't get file corruption, but you can get some really strange 
> things (like the write of one process actually happening before, but being 
> flushed _after_ and overriding the later write of the O_DIRECT process).

Writes are not the real problem here since shared databases typically do
implement sufficient synchronisation, and NFS can guarantee that only
the dirty data will be written out. However reading back the data is
problematic when you have insufficient control over the page cache.

The other issue is, of course, that databases don't _want_ to cache the
data in this situation, so the extra copy to the page cache is just a
bother. As you pointed out, that becomes less of an issue as processor
caches and memory speeds increase, but it is still apparently a
measurable effect.

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ