[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20090721215421.GM4231@webber.adilger.int>
Date: Tue, 21 Jul 2009 15:54:21 -0600
From: Andreas Dilger <adilger@....com>
To: Frank Mayhar <fmayhar@...gle.com>
Cc: Eric Sandeen <sandeen@...hat.com>,
Curt Wohlgemuth <curtw@...gle.com>,
ext4 development <linux-ext4@...r.kernel.org>
Subject: Re: Question on fallocate/ftruncate sequence
On Jul 21, 2009 14:29 -0700, Frank Mayhar wrote:
> I've spent a little while today digging into this. My guess (only a
> guess at this point until I have a chance to prove it) is that
> i_disksize should be updated by fallocate() even when KEEP_SIZE is
> specified. It's currently not updated in that case.
No, that isn't correct. The intent of KEEP_SIZE is to allow fallocate
to preallocate blocks beyond the EOF, so that it doesn't affect the
file data visible to userspace, but can avoid fragmentation from e.g.
log files or mbox files.
The i_disksize variable is just to handle the lag in updating the on-disk
file size during truncate, because the VFS updates i_size to indicate a
truncate, but in order to handle the truncation of files within finite
transaction sizes the on-disk file size needs to be shrunk incrementally.
> It's my
> understanding that i_disksize should be the real allocation, right?
> While i_size is the size that has actually been used? If so, then
> setting i_disksize is probably what's missing.
The difference is that i_size is in the VFS inode, and represents the
current in-memory state, while i_disksize is in the ext4 private inode
data and represents what is currently in the on-disk inode.
If we were to change i_disksize then on the next reboot the filesize
would become whatever is stored in i_disksize.
That said, we might need to have some kind of flag in the on-disk
inode to indicate that it was preallocated beyond EOF. Otherwise,
e2fsck will try and extend the file size to match the block count,
which isn't correct. We could also use this flag to determine if
truncate needs to be run on the inode even if the new size is the
same.
As a workaround for now, you could truncate to (size+1), then again
truncate to (size) and it should have the desired effect.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists