lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 30 Sep 2014 15:25:17 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Valdis.Kletnieks@...edu
Cc:	Matthew Wilcox <willy@...ux.intel.com>,
	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v11 00/21] Add support for NV-DIMMs to ext4

On Sep 30, 2014, at 2:37 PM, Valdis.Kletnieks@...edu wrote:
> On Tue, 30 Sep 2014 12:08:41 -0400, Matthew Wilcox said:
> 
>> The more I think about this, the more I think this is a bad idea.
>> When you have a file open with O_DIRECT, your I/O has to be done in
>> 512-byte multiples, and it has to be aligned to 512-byte boundaries
>> in memory.  If an unsuspecting application has O_DIRECT forced on it,
>> it isn't going to know to do that, and so all its I/Os will fail.
> 
> I'm thinking of more than one place where that would be a feature, not a bug. :)

We prototyped a feature like this for Lustre - so the admins could
turn IO into O_DIRECT, because the HPC compute nodes have relatively
small RAM per core and don't want to have file data cache consuming
RAM that the compute jobs need.

Unfortunately, the O_DIRECT semantics are a killer for poorly written
applications that end up doing small synchronous writes.  We didn't
have any IO size problems, because Lustre client have to copy the data
to the servers anyway, so arbitrary IO sizes are fine.

While this _might_ be OK for NVRAM mapped directly into the filesystem,
even for local disk based storage with 512-byte writes at 100 IOPS is
only 50KB/s instead of ~100MB/s for a cached writes to a single disk.

I think you would be much better off having more aggressive "use once"
semantics in the page cache, so that page cache pages for streaming
writes are evicted more aggressively from cache rather than going down
the "automatic O_DIRECT" hole.

Cheers, Andreas

>> What problem are you really trying to solve?  Some big files hogging
>> the page cache?
> 
> I'm officially a storage admin.  I mostly support HPC and research. As
> such, I'm always looking to add tools to my toolkit. :)
> 
> (And yes, I fully recognize that *in general*, this is a Bad Idea.  However,
> when you've got That One Problem Data File that *should* always be access
> via O_DIRECT, and *usually* is accessed via O_DIRECT, and bad things happen
> if something accesses it without it (for instance, when the file is 1.5X the
> actual RAM), you start looking for fixes.  If you've got another, more
> sustainable way to say "do not let file /X/Y/Z hog the page cache" (and
> no, LD_PRELOAD isn't sustainable the way chattr is, in my book), feel free to
> recommend something. :)


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ