[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8e3e26bb-289d-46ac-958d-7084fb7169a9@hogyros.de>
Date: Wed, 15 Jan 2025 20:04:03 +0900
From: Simon Richter <Simon.Richter@...yros.de>
To: Andreas Wagner <thewand@....de>, linux-ext4@...r.kernel.org
Subject: Re: Two questions on sparse files
Hi,
On 1/15/25 17:34, Andreas Wagner wrote:
> 1. Do they really - as media and Wikipedia suggest - reduce allocated
> filesize if the file is sparse?
Yes. Unless explicitly requested, blocks are allocated to a file when
they are first written, and the file size updated at the same time. So
when you open a file and write 8192 bytes to it, on a file system with
4096 byte blocksize, the write operation will allocate two blocks and
set the file size to 8192.
If you seek to an offset of 8192 before writing, then the write will
still allocate two blocks to hold the data, but the file size is set to
16384, and the first 8192 bytes read back as zeros. Because no blocks
have been allocated for this, the allocated file size is now smaller
than the visible file size, and you have created a sparse file.
So there is no special procedure involved in creating a sparse file, it
just happens if you're not writing the file from start to finish.
> 2. Are the blocks of the file in random order?
Because block allocation happens when the data is written, there is no
guarantee that consecutive data in the file is stored in consecutive
blocks on disk, regardless of whether the file is sparse or not.
If you use a file as backing storage for a virtual machine's harddisk,
the access patterns involved will likely create a sparse file with
suboptimal on-disk layout, but the only way to avoid this is to allocate
the blocks before they are used.
If you want a good chance of getting consecutive blocks, you can use
posix_fallocate(3) to request that file system blocks be allocated at
once. This will grow the file to the requested size immediately (you
cannot have allocated but inaccessible blocks), and since it is a single
request, there is little chance of interference from other file system
accesses that may also want to allocate blocks at the same time.
There is no guarantee though. Only LVM exposes allocation policies, if
your desired performance or some other constraint requires contiguous
allocation, you need to use a logical volume. For typical use cases,
using fallocate is sufficient to produce a minimal number of allocations.
Simon
Powered by blists - more mailing lists