[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.10.0806181704310.2907@woody.linux-foundation.org>
Date: Wed, 18 Jun 2008 17:20:41 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Robert Mueller <robm@...tmail.fm>
cc: Bron Gondwana <brong@...tmail.fm>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Piggin <npiggin@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Andi Kleen <andi@...stfloor.org>, Ingo Molnar <mingo@...e.hu>,
Ken Murchison <murch@...rew.cmu.edu>
Subject: Re: Cyrus mmap vs lseek/write usage - (WAS: BUG: mmapfile/writev
spurious zero bytes (x86_64/not i386, bisected, reproducable))
On Thu, 19 Jun 2008, Robert Mueller wrote:
>
> As noted above, one thing cyrus does which does seem to be plain "wrong"
> is that it mmaps a region greater the file size (rounds to an 8k
> boundary, but 8k-16k past the current end of the file) and then assumes
> that when it writes to the end of the file (but less than the end of the
> mmap region) that there's no need to remmap and that data is immediately
> available within the previous mmaped region.
Pretty much any OS that tries to be make mmap() coherent with regular
read/write accesses will automatically also have to be coherent wrt file
size updates.
IOW, I don't think that cyrus is real any more "wrong" in this than in
assuming that you can mix read/write and mmap() accesses. In fact, I
suspect that Cyrus is probably _more_ conservative than most, in that it
would not be totally unheard of to just do a much bigger mmap(), and not
even bother to re-do it until the file grows past that size (ie no 8k/16k
granularity, but make it arbitrarily non-granular).
> Apparently that works on most OS's (but is what this bug actually
> exposed), but according to the mmap docs:
>
> ---
> If the size of the mapped file changes after the call to mmap() as a
> result of some other operation on the mapped file, the effect of
> references to portions of the mapped region that correspond to added or
> removed portions of the file is unspecified.
Note that if you really want to be portable, you simply must not mix
mmap() with *any* other operations without sprinking in a healthy amount
of "msync()" or unmapping/remapping entirely.
So _in_practice_ - because everybody tries to do a good job - you can
actually expect to have mmap() be coherent, even though there are no real
guarantees.
> Amazingly (apart from HP/UX) no OS actually seems to have a problem with
> this since there would be massive cyrus bug reports otherwise.
Yeah. Over the years, the pain from having a non-coherent mmap() generally
has pushed everybody into just making mmap() easy to use. Which means that
mixing things generally works fine, even if it is not at all _guaranteed_.
So I'd expect mmap+write to work and be coherent almost always. But it's
still a fairly unusual combination, and I would personally think that
using MAP_SHARED and writing through the mmap() would be the less
surprising model.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists