lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 13 Jan 2007 00:54:23 +0300
From:	Michael Tokarev <mjt@....msk.ru>
To:	Linus Torvalds <torvalds@...l.org>
CC:	Chris Mason <chris.mason@...cle.com>,
	dean gaudet <dean@...tic.org>, Viktor <vvp01@...ox.ru>,
	Aubrey <aubreylee@...il.com>, Hua Zhong <hzhong@...il.com>,
	Hugh Dickins <hugh@...itas.com>, linux-kernel@...r.kernel.org,
	hch@...radead.org, kenneth.w.chen@...el.com, akpm@...l.org
Subject: Re: O_DIRECT question

Linus Torvalds wrote:
[]
> My point is that you can get basically ALL THE SAME GOOD BEHAVIOUR without 
> having all the BAD behaviour that O_DIRECT adds.

*This* point I got from the beginning, once I tried to think how it all
is done internally (I never thought about that, because I'm not a kernel
hacker to start with) -- currently, linux has ugly/racy places which are
either difficult or impossible to fix, all due to this O_DIRECT thing
which iteracts badly with other access "methods".

> For example, just the requirement that O_DIRECT can never create a file 
> mapping, and can never interact with ftruncate would actually make 
> O_DIRECT a lot more palatable to me. Together with just the requirement 
> that an O_DIRECT open would literally disallow any non-O_DIRECT accesses, 
> and flush the page cache entirely, would make all the aliases go away.
> 
> At that point, O_DIRECT would be a way of saying "we're going to do 
> uncached accesses to this pre-allocated file". Which is a half-way 
> sensible thing to do.

Half-way?

> But what O_DIRECT does right now is _not_ really sensible, and the 
> O_DIRECT propeller-heads seem to have some problem even admitting that 
> there _is_ a problem, because they don't care. 

Well.  In fact, there's NO problems to admit.

Yes, yes, yes yes - when you think about it from a general point of
view, and think how non-O_DIRECT and O_DIRECT access fits together,
it's a complete mess, and you're 100% right it's a mess.

But.  Those damn "database people" don't mix and match the two accesses
together (I'm not one of them, either - I'm just trying to use a DB
product on linux).  So there's just no issue.  The solution to in-kernel
races and problems in this case is the usage scenario, and in following
simple usage rules.  Basically, the above requiriment - "don't mix&match
the two together" - is implemented in userspace (yes, there's no guarantee
that someone/thing will not do some evil thing, but that's controlled by
file permisions).  That is, database software itself will not try to use
the thing in a wrong way.  Simple as that.

> A lot of DB people seem to simply not care about security or anything 
> else.anything else. I'm trying to tell you that quoting numbers is 
> pointless, when simply the CORRECTNESS of O_DIRECT is very much in doubt.

When done properly - be it in user- or kernel-space, it IS correct.  No
database people are ftruncating() a file *and* reading from the past-end
of it at the same time for example, and don't mix-n-match cached and direct
io, at least not for the same part of a file (if there are, they're really
braindead, or it's just a plain bug).

> I can calculate PI to a billion decimal places in my head in .1 seconds. 
> If you don't care about the CORRECTNESS of the result, that is.
> 
> See? It's not about performance. It's about O_DIRECT being fundamentally 
> broken as it behaves right now.

I recall again the above: the actual USAGE of O_DIRECT, as implemented
in database software, tries to ensure there's no brokeness, especially
fundamental brokeness, just by not performing parallel direct/non-direct
read/writes/truncates.  This way, the thing Just Works, works *correctly*
(provided there's no bugs all the way down to a device), *and* works *fast*.

By the way, I can think of some useful cases where *parts* of a file are
mmap()ed (even for RW access), and parts are being read/written with O_DIRECT.
But that's probably some corner cases.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ