lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20150923151406.GE3318@thunk.org> Date: Wed, 23 Sep 2015 11:14:06 -0400 From: Theodore Ts'o <tytso@....edu> To: "Pocas, Jamie" <Jamie.Pocas@....com> Cc: Eric Sandeen <sandeen@...hat.com>, "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org> Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization With Small Volumes On Wed, Sep 23, 2015 at 12:20:17AM -0400, Pocas, Jamie wrote: > Ted, just to add another data point, with some minor adjustments to > the script to use xfs instead, such as using "mkfs.xfs -b size=1024" > to force 1k blocks, I cannot reproduce the issue and the data block > size doesn't change from 1k. Yes, that's not surprising, because XFS doesn't use the buffer cache layer. Ext4 does, because that's the basis of how the jbd2 layer works. It does change the block size as reported by the block device and which is used by the buffer cache layer, though. (Internally, this is known as the "soft" block size; it's basically the data in which data is cached in the buffer cache layer): root@...-xfstests:~# truncate -s 100M /tmp/foo.img root@...-xfstests:~# mkfs.xfs -b size=1024 /tmp/foo.img meta-data=/tmp/foo.img isize=512 agcount=4, agsize=25600 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0 data = bsize=1024 blocks=102400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=1024 blocks=2573, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@...-xfstests:~# mount -o loop /tmp/foo.img /mnt root@...-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@...-xfstests:~# losetup -c /dev/loop0 root@...-xfstests:~# blockdev --getbsz /dev/loop0 4096 <--------- BUG, note the change in the block size root@...-xfstests:~# touch /mnt/foo root@...-xfstests:~# sync <------ The reason why we don't hang is that XFS doesn't use the <------ buffer cache root@...-xfstests:~# umount /mnt Also feel free to try my repro, but using "blockdev --getbsz /dev/loop" before and after the losetup -c command, and note that it does not hang even though there is no resize2fs in the command sequence at all: root@...-xfstests:~# cp /dev/null /tmp/foo.img root@...-xfstests:~# truncate -s 100M /tmp/foo.img root@...-xfstests:~# mke2fs -t ext4 /tmp/foo.img mke2fs 1.43-WIP (18-May-2015) Discarding device blocks: done Creating filesystem with 102400 1k blocks and 25688 inodes Filesystem UUID: 27dfdbbe-f3a9-48a7-abe8-5a52798a9849 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done root@...-xfstests:~# mount -o loop /tmp/foo.img /mnt root@...-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@...-xfstests:~# losetup -c /dev/loop0 root@...-xfstests:~# blockdev --getbsz /dev/loop0 4096 <------------ BUG root@...-xfstests:~# touch /mnt/foo <------- Should hang here, even though there is no resize2fs command <------- If it doesn't hang right away, try typing the "sync" command > Suffer this small analogy > for me and let me know where I am wrong: say hypothetically I expand > a small partition (or LVM for that matter). Then I try to use > resize2fs to grow the ext filesystem on it. I expect that this > should *not* change the block size of the underlying device (of > course not!) nor the filesystem's block size. The cause of your misunderstanding is not understanding that there are actually 4 different concepts of block/sector size: * The logical block/sector size of the underlying storage device - Retrived via "blockdev --getss /dev/sdXX" - This is the smallest unit that can be sent to the disk from the Host OS. If the logical sector size is different from the physical block size, and write is smaller than the physical sector size (see below), then the disk will do a read-modify-write. - The file system block size MUST be greater than or equal to the logical sector size. * The physical block/sector size of the underlying storage device - Retrived via "blockdev --getpbsz /dev/sdXX" - This is the smallest unit can be physically written to the storage media. - The file system block size SHOULD be greater than or equal to the logical sector size. (To avoid read-modify-write operations by the hard drive that will bad for performance.) * The "soft" block size of the block device. - Retrived via "blockdev --getbsz /dev/sdXX" - This represents the units of storage which is used to cache data in the buffer cache. This only matters if you are using buffer cache --- for example, if you are doing buffered I/O to a block device, or if you are using a file system such as ext4 which is using buffer cache. Since data is indexed in the buffer cache by the 3-tuple (block device, block number, block size), Bad Things happen if you try to change the block size while the file system is mounted. Normally, the kernel will prevent you from changing the block size under these circumstances. * The file system block size. - Retrieved by some file-system dependent command. For ext4, this is "dumpe2fs -h". - Set at format time. For file systems that use the buffer cache, the file system driver will automatically set the "soft" block size of the block device when the file system is mounted. Speaking of LVM, I can't reproduce the problem using LVM, at least not with a 4.3-rc2 kernel: root@...-xfstests:~# pvcreate /dev/vdc Physical volume "/dev/vdc" successfully created root@...-xfstests:~# vgcreate test /dev/vdc Volume group "test" successfully created root@...-xfstests:~# lvcreate -L 100M -n small /dev/test Logical volume "small" created root@...-xfstests:~# mkfs.ext4 -Fq /dev/test/small root@...-xfstests:~# mount -o loop /dev/test/small /mnt root@...-xfstests:~# blockdev --getbsz /dev/loop0 1024 root@...-xfstests:~# lvresize -L 1G /dev/test/small Size of logical volume test/small changed from 100.00 MiB (25 extents) to 1.00 GiB (256 extents). Logical volume small successfully resized root@...-xfstests:~# blockdev --getbsz /dev/loop0 1024 <------ NO BUG, see the block size has not changed root@...-xfstests:~# lvcreate -L 100M -n small /dev/test^C root@...-xfstests:~# touch /mnt/foo ; sync root@...-xfstests:~# resize2fs /dev/test/small resize2fs 1.43-WIP (18-May-2015) Filesystem at /dev/test/small is mounted on /mnt; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 8 The filesystem on /dev/test/small is now 1048576 (1k) blocks long. <------ Note that resize2fs works just fine! root@...-xfstests:~# touch /mnt/bar ; sync root@...-xfstests:~# umount /mnt root@...-xfstests:~# You might see if this works on CentOS; but if it doesn't, I'm pretty convinced this is a bug outside of ext4, and I've already given you a workaround --- using "-b 4096" on the command line to mkfs.ext4 or mke2fs. Alternatively, here's another workaround; you can change modify your /etc/mke2fs.conf so the "small" and "floppy" stanzas read: [fs_types] small = { blocksize = 4096 inode_size = 128 inode_ratio = 4096 } floppy = { blocksize = 4096 inode_size = 128 inode_ratio = 8192 } I'm pretty certain your failures won't reproduce if you either change how you call mke2fs for small file systems, or change your /etc/mke2fs.conf file as shown above. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists