[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG9=OMNMbF_cMr3QXFDwr6yeeCHFv++YEA=0ZAJ_7VXxE8Zrsg@mail.gmail.com>
Date: Mon, 27 Sep 2021 03:43:45 -0700
From: Sarthak Kukreti <sarthakkukreti@...omium.org>
To: Andreas Dilger <adilger@...ger.ca>
Cc: linux-ext4@...r.kernel.org, Gwendal Grignou <gwendal@...omium.org>,
"Theodore Ts'o" <tytso@....edu>
Subject: Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices
Thanks for reviewing the patch, Andreas!
On Tue, Sep 21, 2021 at 2:39 PM Andreas Dilger <adilger@...ger.ca> wrote:
>
> On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@...omium.org> wrote:
> > is
> > From: Sarthak Kukreti <sarthakkukreti@...omium.org>
> >
...
> > Additionally, on thinly provisioned storage devices (like Ceph,
> > dm-thin),
>
> ... and newly-created sparse loopback files
>
Thanks for pointing that out, added to the commit message in v2.
...
> > Testing on ChromeOS (running linux kernel 4.19) with dm-thin
> > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
> >
> > - Time taken by mke2fs drops from 1.07s to 0.08s.
> > - Avoiding zeroing out the inode table and journal reduces the
> > initial metadata space allocation from 0.48% to 0.01%.
> > - Lazy inode table zeroing results in a further 1.45% of logical
> > volume space getting allocated for inode tables, even if not file
> > data is added to the filesystem. With assume_storage_prezeroed,
> > the metadata allocation remains at 0.01%.
>
> This seems beneficial, but I'm wondering if this could also be
> done automatically when TRIM/DISCARD is used by mke2fs to erase
> a device?
>
> One safe option to do this automatically would be to start by
> *reading* the disk blocks and check if they are all zero, and only
> switch to zero-block writes if any block is found with non-zero
> data. That would avoid the extra space usage from zero-block
> writes in the above cases, and also work for the huge majority of
> users that won't know the "assume_storage_prezeroed" option even
> exits, though it won't necessarily reduce the runtime.
>
I agree with Ted (quoting a reply on a forked thread below) that
reading all inode table blocks on the device will slow down mke2fs a
lot depending on the storage medium and size. Maybe it can be done
instead at first mount in conjunction with lazy_itable_init ie. ext4
reads the block and only issues a zero-out if the block is not already
zero? Even so, an explicit hint would be compatible with this
approach: it avoids (unnecessarily) reading through all the inode
table blocks as long as the hint was passed at creation time.
On Wed, Sep 22, 2021 at 8:57 PM Theodore Ts'o <tytso@....edu> wrote:
> The problem is mke2fs really does need to care about the performance
> of discard or write same. Users want mke2fs to be fast, especially
> during the distro installation process. That's why we implemented the
> lazy inode table initialization feature in the first place. So
> reading all each block from the inode table to see if it's zero might
> be slow, and so we might be better off just doing the lazy itable init
> instead.
...
> > + if (assume_storage_prezeroed) {
> > + if (verbose)
> > + printf("%s",
> > + _("Assuming the storage device is prezeroed "
> > + "- skipping inode table and journal wipe\n"));
> > +
> > + lazy_itable_init = 1;
> > + itable_zeroed = 1;
> > + zero_hugefile = 0;
> > + journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
> > + }
>
> Indentation appears to be broken here - only 2 spaces instead of a tab.
>
> This is also missing any kind of test case. Since a large number of
> the e2fsck test cases are using loopback filesystems created on a sparse
> file, this would both be good test cases, as well as reducing time/space
> used during testing.
>
Oops, thanks for catching that! Fixed in v2 and I added a test case
for this option. I was playing around with adding the option as a
default to tests/mke2fs.conf.in; that didn't affect the overall test
run time much (a lot of the tests seem to be dd'ing entire files and
not using sparse files).
Best
Sarthak
Powered by blists - more mailing lists