[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150923151406.GE3318@thunk.org>
Date: Wed, 23 Sep 2015 11:14:06 -0400
From: Theodore Ts'o <tytso@....edu>
To: "Pocas, Jamie" <Jamie.Pocas@....com>
Cc: Eric Sandeen <sandeen@...hat.com>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: resize2fs stuck in ext4_group_extend with 100% CPU Utilization
With Small Volumes
On Wed, Sep 23, 2015 at 12:20:17AM -0400, Pocas, Jamie wrote:
> Ted, just to add another data point, with some minor adjustments to
> the script to use xfs instead, such as using "mkfs.xfs -b size=1024"
> to force 1k blocks, I cannot reproduce the issue and the data block
> size doesn't change from 1k.
Yes, that's not surprising, because XFS doesn't use the buffer cache
layer. Ext4 does, because that's the basis of how the jbd2 layer
works. It does change the block size as reported by the block device
and which is used by the buffer cache layer, though. (Internally,
this is known as the "soft" block size; it's basically the data in
which data is cached in the buffer cache layer):
root@...-xfstests:~# truncate -s 100M /tmp/foo.img
root@...-xfstests:~# mkfs.xfs -b size=1024 /tmp/foo.img
meta-data=/tmp/foo.img isize=512 agcount=4, agsize=25600 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0
data = bsize=1024 blocks=102400, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=1024 blocks=2573, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
root@...-xfstests:~# mount -o loop /tmp/foo.img /mnt
root@...-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@...-xfstests:~# losetup -c /dev/loop0
root@...-xfstests:~# blockdev --getbsz /dev/loop0
4096 <--------- BUG, note the change in the block size
root@...-xfstests:~# touch /mnt/foo
root@...-xfstests:~# sync
<------ The reason why we don't hang is that XFS doesn't use the
<------ buffer cache
root@...-xfstests:~# umount /mnt
Also feel free to try my repro, but using "blockdev --getbsz
/dev/loop" before and after the losetup -c command, and note that it
does not hang even though there is no resize2fs in the command
sequence at all:
root@...-xfstests:~# cp /dev/null /tmp/foo.img
root@...-xfstests:~# truncate -s 100M /tmp/foo.img
root@...-xfstests:~# mke2fs -t ext4 /tmp/foo.img
mke2fs 1.43-WIP (18-May-2015)
Discarding device blocks: done
Creating filesystem with 102400 1k blocks and 25688 inodes
Filesystem UUID: 27dfdbbe-f3a9-48a7-abe8-5a52798a9849
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729
Allocating group tables: done
Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done
root@...-xfstests:~# mount -o loop /tmp/foo.img /mnt
root@...-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@...-xfstests:~# losetup -c /dev/loop0
root@...-xfstests:~# blockdev --getbsz /dev/loop0
4096 <------------ BUG
root@...-xfstests:~# touch /mnt/foo
<------- Should hang here, even though there is no resize2fs command
<------- If it doesn't hang right away, try typing the "sync" command
> Suffer this small analogy
> for me and let me know where I am wrong: say hypothetically I expand
> a small partition (or LVM for that matter). Then I try to use
> resize2fs to grow the ext filesystem on it. I expect that this
> should *not* change the block size of the underlying device (of
> course not!) nor the filesystem's block size.
The cause of your misunderstanding is not understanding that there are
actually 4 different concepts of block/sector size:
* The logical block/sector size of the underlying storage device
- Retrived via "blockdev --getss /dev/sdXX"
- This is the smallest unit that can be sent to the disk from
the Host OS. If the logical sector size is different from
the physical block size, and write is smaller than the
physical sector size (see below), then the disk will do a
read-modify-write.
- The file system block size MUST be greater than or equal to
the logical sector size.
* The physical block/sector size of the underlying storage device
- Retrived via "blockdev --getpbsz /dev/sdXX"
- This is the smallest unit can be physically written to the
storage media.
- The file system block size SHOULD be greater than or equal
to the logical sector size. (To avoid read-modify-write
operations by the hard drive that will bad for performance.)
* The "soft" block size of the block device.
- Retrived via "blockdev --getbsz /dev/sdXX"
- This represents the units of storage which is used to cache
data in the buffer cache. This only matters if you are
using buffer cache --- for example, if you are doing
buffered I/O to a block device, or if you are using a file
system such as ext4 which is using buffer cache. Since data
is indexed in the buffer cache by the 3-tuple (block device,
block number, block size), Bad Things happen if you try to
change the block size while the file system is mounted.
Normally, the kernel will prevent you from changing the
block size under these circumstances.
* The file system block size.
- Retrieved by some file-system dependent command. For ext4,
this is "dumpe2fs -h".
- Set at format time. For file systems that use the buffer
cache, the file system driver will automatically set the
"soft" block size of the block device when the file system
is mounted.
Speaking of LVM, I can't reproduce the problem using LVM, at least not
with a 4.3-rc2 kernel:
root@...-xfstests:~# pvcreate /dev/vdc
Physical volume "/dev/vdc" successfully created
root@...-xfstests:~# vgcreate test /dev/vdc
Volume group "test" successfully created
root@...-xfstests:~# lvcreate -L 100M -n small /dev/test
Logical volume "small" created
root@...-xfstests:~# mkfs.ext4 -Fq /dev/test/small
root@...-xfstests:~# mount -o loop /dev/test/small /mnt
root@...-xfstests:~# blockdev --getbsz /dev/loop0
1024
root@...-xfstests:~# lvresize -L 1G /dev/test/small
Size of logical volume test/small changed from 100.00 MiB (25 extents) to 1.00 GiB (256 extents).
Logical volume small successfully resized
root@...-xfstests:~# blockdev --getbsz /dev/loop0
1024 <------ NO BUG, see the block size has not changed
root@...-xfstests:~# lvcreate -L 100M -n small /dev/test^C
root@...-xfstests:~# touch /mnt/foo ; sync
root@...-xfstests:~# resize2fs /dev/test/small
resize2fs 1.43-WIP (18-May-2015)
Filesystem at /dev/test/small is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 8
The filesystem on /dev/test/small is now 1048576 (1k) blocks long.
<------ Note that resize2fs works just fine!
root@...-xfstests:~# touch /mnt/bar ; sync
root@...-xfstests:~# umount /mnt
root@...-xfstests:~#
You might see if this works on CentOS; but if it doesn't, I'm pretty
convinced this is a bug outside of ext4, and I've already given you a
workaround --- using "-b 4096" on the command line to mkfs.ext4 or
mke2fs.
Alternatively, here's another workaround; you can change modify your
/etc/mke2fs.conf so the "small" and "floppy" stanzas read:
[fs_types]
small = {
blocksize = 4096
inode_size = 128
inode_ratio = 4096
}
floppy = {
blocksize = 4096
inode_size = 128
inode_ratio = 8192
}
I'm pretty certain your failures won't reproduce if you either change
how you call mke2fs for small file systems, or change your
/etc/mke2fs.conf file as shown above.
Cheers,
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists