[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110518061356.GY19446@dastard>
Date: Wed, 18 May 2011 16:13:56 +1000
From: Dave Chinner <david@...morbit.com>
To: Eric Sandeen <sandeen@...hat.com>
Cc: Jiaying Zhang <jiayingz@...gle.com>, tytso@....edu,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in
ext4_setattr()
On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote:
> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
> > flag and then ftruncate the file to a size larger than the file's i_size,
> > any allocated but unwritten blocks will be freed but the file size is set
> > to the size that ftruncate specifies.
> >
> > Here is a simple test to reproduce the problem:
> > 1. fallocate a 12k size file with KEEP_SIZE flag
> > 2. write the first 4k
> > 3. ftruncate the file to 8k
> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
> > shows the file has only the first written block left.
>
> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>
> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>
> # xfs_bmap -vp testfile
> testfile:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000
Ok, so that's the case for a _truncate up_ from 4k to 8k:
$ rm /mnt/test/foo
$ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 0
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..23]: 9712..9735 0 (9712..9735) 24 10000
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec)
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 8192
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 2
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
But you get a different result on truncate down:
$rm /mnt/test/foo
$ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 12288
stat.blocks = 24
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 1
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..23]: 9584..9607 0 (9584..9607) 24 10000
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec)
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
1: [8..23]: 9592..9607 0 (9592..9607) 16 10000
/mnt/test/foo:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
1: [8..15]: 9592..9599 0 (9592..9599) 8 10000
fd.path = "/mnt/test/foo"
fd.flags = non-sync,non-direct,read-write
stat.ino = 71
stat.type = regular file
stat.size = 8192
stat.blocks = 16
fsxattr.xflags = 0x2 [-p------------]
fsxattr.projid = 0
fsxattr.extsize = 0
fsxattr.nextents = 2
fsxattr.naextents = 0
dioattr.mem = 0x200
dioattr.miniosz = 512
dioattr.maxiosz = 2147483136
IOWs, on XFS a truncate up does not change the preallocation at all,
while a truncate down will _always_ remove preallocation beyond the
new EOF. It's always had this behaviour w.r.t. to truncate(2) and
preallocation beyond EOF.
> I think this is a different result from ext4, either with or without your patch.
>
> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>
> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
It's entirely up to the filesystem how it treats blocks beyond EOF
during truncation. XFS frees them on truncate down, because it is
much safer to just truncate away everything beyond the new EOF than
to leave written extents beyond EOF as potential landmines.
Indeed, that's why calling vmtruncate() as a bad fix. If you have:
NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU
....----+----------+--------+--------+
A B C D
Where A = new EOF (N)
A->B = unwritten (U)
B->C = written (W)
C = old EOF (O)
C->D = unwritten (U)
Then just calling vmtruncate() will leave the blocks in the range
B->C as written blocks. Hence then doing an extending truncate back
out to D will expose stale data rather than zeros in the range
B->C....
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists