lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 17 Apr 2018 11:14:23 -0700
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: mke2fs and -E no discard

On Tue, Apr 17, 2018 at 11:58:16AM -0600, Andreas Dilger wrote:
> I was just looking at a posting on stackexchange related to mke2fs
> and filesystem image fragmentation:
> 
>     https://unix.stackexchange.com/questions/287133/why-is-fragmentation-level-so-huge-in-files-that-contain-other-filesystems
> 
> It seems like "mke2fs" is causing issues for creation of filesystems in
> preallocated files, as it is discarding the previously-allocated blocks,
> even if the "-E nodiscard" option is used.
> 
> # dd if=/dev/zero of=/var/tmp/tt bs=1M count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 0.272006 s, 385 MB/s
> # sync
> # filefrag -v /var/tmp/tt
> Filesystem type is: ef53
> File size of /var/tmp/tt is 104857600 (25600 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:        0..   24575:    1261568..   1286143:  24576:
>    1:    24576..   25599:    1292288..   1293311:   1024:    1286144: last,eof
> /var/tmp/tt: 2 extents found
> # ./misc/mke2fs -E nodiscard /var/tmp/tt
> mke2fs 1.43-WIP (15-Mar-2016)
> Creating filesystem with 102400 1k blocks and 25688 inodes
> Filesystem UUID: 3f8a0ca8-70b8-4801-b834-99dff0a6a642
> Superblock backups stored on blocks:
> 	8193, 24577, 40961, 57345, 73729
> 
> Allocating group tables: done
> Writing inode tables: done
> Writing superblocks and filesystem accounting information: done
> 
> [root@...kie e2fsprogs-git]# filefrag -v /var/tmp/tt
> Filesystem type is: ef53
> File size of /var/tmp/tt is 104857600 (25600 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:        0..      65:    1261568..   1261633:     66:
>    1:      127..    2113:    1261695..   1263681:   1987:    1261634:
>    2:     2175..    4096:    1263743..   1265664:   1922:    1263682:
>    3:     4158..    6209:    1265726..   1267777:   2052:    1265665:
>    4:     6271..    8192:    1267839..   1269760:   1922:    1267778:
>    5:     8254..   10305:    1269822..   1271873:   2052:    1269761:
>    6:    10367..   12288:    1271935..   1273856:   1922:    1271874:
>    7:    12350..   14401:    1273918..   1275969:   2052:    1273857:
>    8:    14463..   16384:    1276031..   1277952:   1922:    1275970:
>    9:    16446..   18497:    1278014..   1280065:   2052:    1277953:
>   10:    18559..   20480:    1280127..   1282048:   1922:    1280066:
>   11:    20542..   22528:    1282110..   1284096:   1987:    1282049:
>   12:    22590..   24575:    1284158..   1286143:   1986:    1284097:
>   13:    24576..   24576:    1292288..   1292288:      1:    1286144:
>   14:    24638..   25583:    1292350..   1293295:    946:    1292289: last
> /var/tmp/tt: 15 extents found
> 
> 
> Without "-E nodiscard", mke2fs prints the message "Discarding device blocks:
> done" and calls fallocate() (from strace):
>     fallocate(3, 03, 0, 1048576)            = 0
>     fallocate(3, 03, 1048576, 103809024)    = 0
> which is NOT called in the "-E nodiscard" case, but it somehow is discarding
> the allocated blocks anyway during inode table allocation:
> 
> write(1, "Writing inode tables: ", 22)  = 22
> write(1, " 0/13", 5)                    = 5
> write(1, "\10\10\10\10\10", 5)          = 5
> fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0
> fallocate(3, 03, 267264, 252928)        = 0

Hmmm, unix_zeroout() really ought to be trying ZERO_RANGE before
resorting to PUNCH_HOLE, since ZERO_RANGE fills holes with unwritten
extents and converts written to unwritten.

(Or at least it does on ext4 and xfs...)

Also looking at that function, it's probably time to go back and fix it
for block devices since 'the BLKZEROOUT mess' can be avoided by using
block device fallocate.

--D

> fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0
> fallocate(3, 03, 8655872, 252928)       = 0
> fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0
> fallocate(3, 03, 16780288, 252928)      = 0
> :
> :
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ