[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140215031624.GI9176@birch.djwong.org>
Date: Fri, 14 Feb 2014 19:16:24 -0800
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: "Theodore Ts'o" <tytso@....edu>
Cc: Jon Bernard <jbernard@...ion.com>,
Dmitry Monakhov <dmonakhov@...nvz.org>,
linux-ext4@...r.kernel.org
Subject: Re: kernel bug at fs/ext4/resize.c:409
Per Ted's request, I've started editing a document on the ext4 wiki:
https://ext4.wiki.kernel.org/index.php/Ext4_VM_Images
[comments below too]
On Fri, Feb 14, 2014 at 06:46:31PM -0500, Theodore Ts'o wrote:
> On Fri, Feb 14, 2014 at 03:19:05PM -0500, Jon Bernard wrote:
> > Ahh, I see. Here's where this comes from: the particular usecase is
> > provisioning of new cloud instances whose root volume is of unknown
> > size. The filesystem and its contents are created and bundled
> > before-hand into the smallest filesystem possible. The instance is PXE
> > booted for provisioning and the root filesystem is then copied onto the
> > disk - and then resized to take advantage of the total amount of space.
> >
> > In order to support very large partitions, the filesystem is created
> > with an abnormally large inode table so that large resizes would be
> > possible. I traced it to this commit as best I can tell:
> >
> > https://github.com/openstack/diskimage-builder/commit/fb246a02eb2ed330d3cc37f5795b3ed026aabe07
> >
> > I assumed that additional inodes would be allocated along with block
> > groups during an online resize, but that commit contradicts my current
> > understanding.
>
> Additional inodes *are* allocated as the file system is grown.
> However thought otherwise was wrong. What happens is that there is a
> fixed number of inodes per block group. When the file system is
> resized, either by growing or shrinking file system, as block groups
> are added or removed from the file system, the number of inodes
> is also added or removed.
>
> > I suggested that the filesystem be created during the time of
> > provisioning to allow a more optimal on-disk layout, and I believe this
> > is being considered now.
>
> What causes the most damage in terms of a non-optimal data block
> layout, installing the file system on a large file system, and then
> shrinking the file system to its minimum size use resize2fs -M. There
> is so some non-optimality that occurs as the file system gets filled
> beyond about 90% full, but that it's not nearly so bad as shrinking
> the file system --- which you should avoid at all costs.
>
> From a performance point of view, the only time you should try to do
> an off-line resize2fs shrink is if you are shrinking the file system
> by a handful of blocks as part of converting a file system in place to
> use LVM or LUKS encryption, and you need to make room for some
> metadata blocks at the end of the partition.
>
> The other thing thing to note is that if you are using a format such
> as qcow2, or something like the device-mapper's thin-provisining
> (thinkp) scheme, or if you are willing to deal with sparse files, one
> approach is to not resize the file system at all. You could just use
> a tool like zerofree[1] to zero out all of the unused blocks in the
> file system, and then use "/bin/cp --sparse==always" to cause all zero
> blocks to be treated as sparse blocks on the destination file.
>
> [1] http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/util/zerofree.c
I have a zerofree variant that knows how to punch/discard blocks that I'll
throw into contrib/ the next time I send out one of my megapatch sets.
> This is part of how I maintain my root filesystem that I use in a VM
> for testing ext4 changes upstream. After I update to the latest
> Debian unstable package updates, install the latest updates from the
> xfstests and e2fsprogs git repositories, I then run the following
> script which uses the zerofree.c program to compress the qcow2 root
> file system image that I use with kvm:
>
> http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/compress-rootfs
>
>
> Also, starting with e2fsprogs 1.42.10, there's another way you can
These three options (-rap) are available in 1.42.9. Is there a particular
reason not to use it before 1.42.10?
> efficiently deploy a large file system image by only copying the
> blocks which are in use, by using a command like this:
>
> e2image -rap src_fs dest_fs
>
> (See also the -c flag as described in e2image's man page if you want
> to use this technique to do incremental image-based backups onto a
> flash-based backup medium; I was using this for a while to keep two
> laptop SSD's root filesystem in sync with one another.)
>
> So there are lots of ways that you can do what you need, all without
> playing games with resize2fs. Perhaps some of them would actually be
> better for your use case.
Calvin Watson noted on Ted's G+ repost that one can use fstrim in newer
versions of QEMU (1.5+?) to punch out unused blocks if the virtual disk is
emulated via virtio-scsi.
--D
>
>
> > If it turns out to be not terribly complicated and there is not an
> > immediate time constraint, I would love to try to help with this or at
> > least test patches.
>
> I will hopefully have a bug fix in the next week or two.
>
> Cheers,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists