linux-ext4 - Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG9=OMNMbF_cMr3QXFDwr6yeeCHFv++YEA=0ZAJ_7VXxE8Zrsg@mail.gmail.com>
Date:   Mon, 27 Sep 2021 03:43:45 -0700
From:   Sarthak Kukreti <sarthakkukreti@...omium.org>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     linux-ext4@...r.kernel.org, Gwendal Grignou <gwendal@...omium.org>,
        "Theodore Ts'o" <tytso@....edu>
Subject: Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices

Thanks for reviewing the patch, Andreas!

On Tue, Sep 21, 2021 at 2:39 PM Andreas Dilger <adilger@...ger.ca> wrote:
>
> On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@...omium.org> wrote:
> > is
> > From: Sarthak Kukreti <sarthakkukreti@...omium.org>
> >
...
> > Additionally, on thinly provisioned storage devices (like Ceph,
> > dm-thin),
>
> ... and newly-created sparse loopback files
>
Thanks for pointing that out, added to the commit message in v2.
...
> > Testing on ChromeOS (running linux kernel 4.19) with dm-thin
> > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
> >
> > - Time taken by mke2fs drops from 1.07s to 0.08s.
> > - Avoiding zeroing out the inode table and journal reduces the
> >  initial metadata space allocation from 0.48% to 0.01%.
> > - Lazy inode table zeroing results in a further 1.45% of logical
> >  volume space getting allocated for inode tables, even if not file
> >  data is added to the filesystem. With assume_storage_prezeroed,
> >  the metadata allocation remains at 0.01%.
>
> This seems beneficial, but I'm wondering if this could also be
> done automatically when TRIM/DISCARD is used by mke2fs to erase
> a device?
>
> One safe option to do this automatically would be to start by
> *reading* the disk blocks and check if they are all zero, and only
> switch to zero-block writes if any block is found with non-zero
> data.  That would avoid the extra space usage from zero-block
> writes in the above cases, and also work for the huge majority of
> users that won't know the "assume_storage_prezeroed" option even
> exits, though it won't necessarily reduce the runtime.
>
I agree with Ted (quoting a reply on a forked thread below) that
reading all inode table blocks on the device will slow down mke2fs a
lot depending on the storage medium and size. Maybe it can be done
instead at first mount in conjunction with lazy_itable_init ie. ext4
reads the block and only issues a zero-out if the block is not already
zero? Even so, an explicit hint would be compatible with this
approach: it avoids (unnecessarily) reading through all the inode
table blocks as long as the hint was passed at creation time.

On Wed, Sep 22, 2021 at 8:57 PM Theodore Ts'o <tytso@....edu> wrote:
> The problem is mke2fs really does need to care about the performance
> of discard or write same.  Users want mke2fs to be fast, especially
> during the distro installation process.  That's why we implemented the
> lazy inode table initialization feature in the first place.  So
> reading all each block from the inode table to see if it's zero might
> be slow, and so we might be better off just doing the lazy itable init
> instead.
...
> > +     if (assume_storage_prezeroed) {
> > +       if (verbose)
> > +                     printf("%s",
> > +                                    _("Assuming the storage device is prezeroed "
> > +                         "- skipping inode table and journal wipe\n"));
> > +
> > +       lazy_itable_init = 1;
> > +       itable_zeroed = 1;
> > +       zero_hugefile = 0;
> > +       journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
> > +     }
>
> Indentation appears to be broken here - only 2 spaces instead of a tab.
>
> This is also missing any kind of test case.  Since a large number of
> the e2fsck test cases are using loopback filesystems created on a sparse
> file, this would both be good test cases, as well as reducing time/space
> used during testing.
>
Oops, thanks for catching that! Fixed in v2 and I added a test case
for this option. I was playing around with adding the option as a
default to tests/mke2fs.conf.in; that didn't affect the overall test
run time much (a lot of the tests seem to be dd'ing entire files and
not using sparse files).

Best
Sarthak