[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=UzYsLOj8jbfp_wq-r_rFaTU3AAg@mail.gmail.com>
Date: Wed, 18 May 2011 13:42:34 -0700
From: Jiaying Zhang <jiayingz@...gle.com>
To: Dave Chinner <david@...morbit.com>
Cc: Eric Sandeen <sandeen@...hat.com>, tytso@....edu,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr()
On Tue, May 17, 2011 at 11:13 PM, Dave Chinner <david@...morbit.com> wrote:
> On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote:
>> On 5/17/11 5:59 PM, Jiaying Zhang wrote:
>> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks
>> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE
>> > flag and then ftruncate the file to a size larger than the file's i_size,
>> > any allocated but unwritten blocks will be freed but the file size is set
>> > to the size that ftruncate specifies.
>> >
>> > Here is a simple test to reproduce the problem:
>> > 1. fallocate a 12k size file with KEEP_SIZE flag
>> > 2. write the first 4k
>> > 3. ftruncate the file to 8k
>> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs
>> > shows the file has only the first written block left.
>>
>> To be honest I'm not 100% certain what the fiesystem -should- do in this case.
>>
>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten:
>>
>> # xfs_bmap -vp testfile
>> testfile:
>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000
>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000
>
> Ok, so that's the case for a _truncate up_ from 4k to 8k:
>
> $ rm /mnt/test/foo
> $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 0
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec)
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
> 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000
> 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 8192
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 2
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
>
> But you get a different result on truncate down:
>
> $rm /mnt/test/foo
> $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 12288
> stat.blocks = 24
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 1
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000
> wrote 4096/4096 bytes at offset 0
> 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec)
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
> 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000
> /mnt/test/foo:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000
> 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000
> fd.path = "/mnt/test/foo"
> fd.flags = non-sync,non-direct,read-write
> stat.ino = 71
> stat.type = regular file
> stat.size = 8192
> stat.blocks = 16
> fsxattr.xflags = 0x2 [-p------------]
> fsxattr.projid = 0
> fsxattr.extsize = 0
> fsxattr.nextents = 2
> fsxattr.naextents = 0
> dioattr.mem = 0x200
> dioattr.miniosz = 512
> dioattr.maxiosz = 2147483136
>
> IOWs, on XFS a truncate up does not change the preallocation at all,
> while a truncate down will _always_ remove preallocation beyond the
> new EOF. It's always had this behaviour w.r.t. to truncate(2) and
> preallocation beyond EOF.
>
>> I think this is a different result from ext4, either with or without your patch.
>>
>> On ext4 I get size 8k, but only the first 4k mapped, as you say.
>>
>> I don't recall when truncate is supposed to free fallocated blocks, and from what point?
>
> It's entirely up to the filesystem how it treats blocks beyond EOF
> during truncation. XFS frees them on truncate down, because it is
> much safer to just truncate away everything beyond the new EOF than
> to leave written extents beyond EOF as potential landmines.
>
> Indeed, that's why calling vmtruncate() as a bad fix. If you have:
>
>
> NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU
> ....----+----------+--------+--------+
> A B C D
>
> Where A = new EOF (N)
> A->B = unwritten (U)
> B->C = written (W)
> C = old EOF (O)
> C->D = unwritten (U)
>
> Then just calling vmtruncate() will leave the blocks in the range
> B->C as written blocks. Hence then doing an extending truncate back
> out to D will expose stale data rather than zeros in the range
> B->C....
Sorry I am a little confused. If I understand correctly, in the situation
you described, we call a truncate that causes EOF to change from
C to A. On ext4, we should free all of blocks after A. And when we
do an extending truncate to D, any blocks beyond A should be treated
as unwritten blocks so we should not expose any stale data, right?
Jiaying
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists