[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070305114109.GA454@lazybastard.org>
Date: Mon, 5 Mar 2007 12:41:09 +0100
From: Jörn Engel <joern@...ybastard.org>
To: Arnd Bergmann <arnd@...db.de>
Cc: Ulrich Drepper <drepper@...hat.com>,
Anton Altaparmakov <aia21@....ac.uk>,
Christoph Hellwig <hch@...radead.org>,
Dave Kleikamp <shaggy@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
"Amit K. Arora" <aarora@...ux.vnet.ibm.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org, suparna@...ibm.com, cmm@...ibm.com,
alex@...sterfs.com, suzuki@...ibm.com
Subject: Re: [RFC] Heads up on sys_fallocate()
On Mon, 5 March 2007 01:36:36 +0100, Arnd Bergmann wrote:
>
> Using the current glibc implementation on a compressed file system ideally
> should be a very expensive no-op because you won't actually allocate much
> space for a file when writing zeroes to it. You also don't benefit of a
> contiguous allocation in logfs, since flash has uniform seek times over
> all the medium.
>
> I'd suggest you implement posix_fallocate as an real nop and just return
> success without doing anything. You could also return ENOSPC in case
> the blocks requested by posix_fallocate don't fit on the medium without
> compression, but that is more or less just guesswork (like statfs is).
Quoting POSIX_FALLOCATE(3):
The function posix_fallocate() ensures that disk space is allocated for
the file referred to by the descriptor fd for the bytes in the range
starting at offset and continuing for len bytes. After a successful
call to posix_fallocate(), subsequent writes to bytes in the specified
range are guaranteed not to fail because of lack of disk space.
If the size of the file is less than offset+len, then the file is
increased to this size; otherwise the file size is left unchanged.
Afaics, the (main) purpose of this function is not to decrease
fragmentation but to ensure mmap() won't cause any problems because the
medium fills up. That problem exists for LogFS as well, once rw mmap()
is supported.
Simply returning success without doing anything would be a bug. -ENOSPC
is a better choice, but still a lame implementation. And falling back
on libc to write zeroes in a loop is an exercise in futility.
Does the allocation have to be persistent beyond lifetime of the file
descriptor? It would be fairly simple to support the write guarantee
while the file is open (or rather the inode remains cached) and drop it
afterwards.
Jörn
--
"[One] doesn't need to know [...] how to cause a headache in order
to take an aspirin."
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists