lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0A4B11C1-A119-4733-A841-683889E9DC7B@amazon.com>
Date:   Thu, 23 Sep 2021 03:31:00 +0000
From:   "Kiselev, Oleg" <okiselev@...zon.com>
To:     Andreas Dilger <adilger@...ger.ca>
CC:     "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] mke2fs: Add extended option for prezeroed storage devices

Wouldn't it make more sense to use "write-same" of 0 instead of writing a page of zeros and task the layers that do thin provisioning and return 0 on read from unallocated blocks to check if a block exists before writing zeros to it?

On 9/21/21, 2:40 PM, "Andreas Dilger" <adilger@...ger.ca> wrote:

    On Sep 20, 2021, at 9:42 PM, Sarthak Kukreti <sarthakkukreti@...omium.org> wrote:
    > 
    > From: Sarthak Kukreti <sarthakkukreti@...omium.org>
    > 
    > This patch adds an extended option "assume_storage_prezeroed" to
    > mke2fs. When enabled, this option acts as a hint to mke2fs that
    > the underlying block device was zeroed before mke2fs was called.
    > This allows mke2fs to optimize out the zeroing of the inode
    > table and the journal, which speeds up the filesystem creation
    > time.
    > 
    > Additionally, on thinly provisioned storage devices (like Ceph,
    > dm-thin),

    ... and newly-created sparse loopback files

    > reads on unmapped extents return zero. This property
    > allows mke2fs (with assume_storage_prezeroed) to avoid
    > pre-allocating metadata space for inode tables for the entire
    > filesystem and saves space that would normally be preallocated
    > for zero inode tables.
    > 
    > Testing on ChromeOS (running linux kernel 4.19) with dm-thin
    > and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
    > 
    > - Time taken by mke2fs drops from 1.07s to 0.08s.
    > - Avoiding zeroing out the inode table and journal reduces the
    >  initial metadata space allocation from 0.48% to 0.01%.
    > - Lazy inode table zeroing results in a further 1.45% of logical
    >  volume space getting allocated for inode tables, even if not file
    >  data is added to the filesystem. With assume_storage_prezeroed,
    >  the metadata allocation remains at 0.01%.

    This seems beneficial, but I'm wondering if this could also be
    done automatically when TRIM/DISCARD is used by mke2fs to erase
    a device?

    One safe option to do this automatically would be to start by
    *reading* the disk blocks and check if they are all zero, and only
    switch to zero-block writes if any block is found with non-zero
    data.  That would avoid the extra space usage from zero-block
    writes in the above cases, and also work for the huge majority of
    users that won't know the "assume_storage_prezeroed" option even
    exits, though it won't necessarily reduce the runtime.

    > diff --git a/misc/mke2fs.c b/misc/mke2fs.c
    > index 04b2fbce..5293d9b0 100644
    > --- a/misc/mke2fs.c
    > +++ b/misc/mke2fs.c
    > @@ -3095,6 +3102,18 @@ int main (int argc, char *argv[])
    > 		io_channel_set_options(fs->io, opt_string);
    > 	}
    > 
    > +	if (assume_storage_prezeroed) {
    > +	  if (verbose)
    > +			printf("%s",
    > +				       _("Assuming the storage device is prezeroed "
    > +                         "- skipping inode table and journal wipe\n"));
    > +
    > +	  lazy_itable_init = 1;
    > +	  itable_zeroed = 1;
    > +	  zero_hugefile = 0;
    > +	  journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
    > +	}

    Indentation appears to be broken here - only 2 spaces instead of a tab.

    This is also missing any kind of test case.  Since a large number of
    the e2fsck test cases are using loopback filesystems created on a sparse
    file, this would both be good test cases, as well as reducing time/space
    used during testing.

    Cheers, Andreas






Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ