[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C5A2A75B-F767-40AC-B500-C99D484E9E30@dilger.ca>
Date: Tue, 21 Sep 2021 15:39:51 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Sarthak Kukreti <sarthakkukreti@...omium.org>
Cc: linux-ext4@...r.kernel.org, gwendal@...omium.org, tytso@....edu
Subject: Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices
On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@...omium.org> wrote:
>
> From: Sarthak Kukreti <sarthakkukreti@...omium.org>
>
> This patch adds an extended option "assume_storage_prezeroed" to
> mke2fs. When enabled, this option acts as a hint to mke2fs that
> the underlying block device was zeroed before mke2fs was called.
> This allows mke2fs to optimize out the zeroing of the inode
> table and the journal, which speeds up the filesystem creation
> time.
>
> Additionally, on thinly provisioned storage devices (like Ceph,
> dm-thin),
... and newly-created sparse loopback files
> reads on unmapped extents return zero. This property
> allows mke2fs (with assume_storage_prezeroed) to avoid
> pre-allocating metadata space for inode tables for the entire
> filesystem and saves space that would normally be preallocated
> for zero inode tables.
>
> Testing on ChromeOS (running linux kernel 4.19) with dm-thin
> and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
>
> - Time taken by mke2fs drops from 1.07s to 0.08s.
> - Avoiding zeroing out the inode table and journal reduces the
> initial metadata space allocation from 0.48% to 0.01%.
> - Lazy inode table zeroing results in a further 1.45% of logical
> volume space getting allocated for inode tables, even if not file
> data is added to the filesystem. With assume_storage_prezeroed,
> the metadata allocation remains at 0.01%.
This seems beneficial, but I'm wondering if this could also be
done automatically when TRIM/DISCARD is used by mke2fs to erase
a device?
One safe option to do this automatically would be to start by
*reading* the disk blocks and check if they are all zero, and only
switch to zero-block writes if any block is found with non-zero
data. That would avoid the extra space usage from zero-block
writes in the above cases, and also work for the huge majority of
users that won't know the "assume_storage_prezeroed" option even
exits, though it won't necessarily reduce the runtime.
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 04b2fbce..5293d9b0 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -3095,6 +3102,18 @@ int main (int argc, char *argv[])
> io_channel_set_options(fs->io, opt_string);
> }
>
> + if (assume_storage_prezeroed) {
> + if (verbose)
> + printf("%s",
> + _("Assuming the storage device is prezeroed "
> + "- skipping inode table and journal wipe\n"));
> +
> + lazy_itable_init = 1;
> + itable_zeroed = 1;
> + zero_hugefile = 0;
> + journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
> + }
Indentation appears to be broken here - only 2 spaces instead of a tab.
This is also missing any kind of test case. Since a large number of
the e2fsck test cases are using loopback filesystems created on a sparse
file, this would both be good test cases, as well as reducing time/space
used during testing.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists