lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Oct 2021 20:49:58 -0700
From:   Sarthak Kukreti <sarthakkukreti@...omium.org>
To:     linux-ext4@...r.kernel.org
Cc:     Andreas Dilger <adilger@...ger.ca>,
        Gwendal Grignou <gwendal@...omium.org>,
        "Theodore Ts'o" <tytso@....edu>, okiselev@...zon.com
Subject: Re: [PATCH v2] mke2fs: Add extended option for prezeroed storage devices

Hi all,

Thanks for the discussions on the original patch. I wanted to circle
back and see if you had any further comments/concerns on the second
version of the patchset.

Best
Sarthak

On Mon, Sep 27, 2021 at 3:44 AM Sarthak Kukreti
<sarthakkukreti@...omium.org> wrote:
>
> This patch adds an extended option "assume_storage_prezeroed" to
> mke2fs. When enabled, this option acts as a hint to mke2fs that
> the underlying block device was zeroed before mke2fs was called.
> This allows mke2fs to optimize out the zeroing of the inode
> table and the journal, which speeds up the filesystem creation
> time.
>
> Additionally, on thinly provisioned storage devices (like Ceph,
> dm-thin, newly created sparse loopback files), reads on unmapped extents
> return zero. This property allows mke2fs (with assume_storage_prezeroed)
> to avoid pre-allocating metadata space for inode tables for the entire
> filesystem and saves space that would normally be preallocated
> for zero inode tables.
>
> Tests
> -----
> 1) Running 'mke2fs -t ext4' on 10G sparse files on an ext4
> filesystem drops the time taken by mke2fs from 0.09s to 0.04s
> and reduces the initial metadata space allocation (stat on
> sparse file) from 139736 blocks (545M) to 8672 blocks (34M).
>
> 2) On ChromeOS (running linux kernel 4.19) with dm-thin
> and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
>
> - Time taken by mke2fs drops from 1.07s to 0.08s.
> - Avoiding zeroing out the inode table and journal reduces the
>   initial metadata space allocation from 0.48% to 0.01%.
> - Lazy inode table zeroing results in a further 1.45% of logical
>   volume space getting allocated for inode tables, even if no file
>   data is added to the filesystem. With assume_storage_prezeroed,
>   the metadata allocation remains at 0.01%.
>
> Signed-off-by: Sarthak Kukreti <sarthakkukreti@...omium.org>
> --
> Changes in v2: Added regression test, fixed indentation.
> ---
>  misc/mke2fs.8.in                        |  7 ++++++
>  misc/mke2fs.c                           | 21 ++++++++++++++++-
>  tests/m_assume_storage_prezeroed/expect |  2 ++
>  tests/m_assume_storage_prezeroed/script | 31 +++++++++++++++++++++++++
>  4 files changed, 60 insertions(+), 1 deletion(-)
>  create mode 100644 tests/m_assume_storage_prezeroed/expect
>  create mode 100644 tests/m_assume_storage_prezeroed/script
>
> diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
> index c0b53245..5c6ea5ec 100644
> --- a/misc/mke2fs.8.in
> +++ b/misc/mke2fs.8.in
> @@ -365,6 +365,13 @@ small risk if the system crashes before the journal has been overwritten
>  entirely one time.  If the option value is omitted, it defaults to 1 to
>  enable lazy journal inode zeroing.
>  .TP
> +.B assume_storage_prezeroed\fR[\fB= \fI<0 to disable, 1 to enable>\fR]
> +If enabled,
> +.BR mke2fs
> +assumes that the storage device has been prezeroed, skips zeroing the journal
> +and inode tables, and annotates the block group flags to signal that the inode
> +table has been zeroed.
> +.TP
>  .B no_copy_xattrs
>  Normally
>  .B mke2fs
> diff --git a/misc/mke2fs.c b/misc/mke2fs.c
> index 04b2fbce..24c69966 100644
> --- a/misc/mke2fs.c
> +++ b/misc/mke2fs.c
> @@ -95,6 +95,7 @@ int   journal_size;
>  int    journal_flags;
>  int    journal_fc_size;
>  static int     lazy_itable_init;
> +static int     assume_storage_prezeroed;
>  static int     packed_meta_blocks;
>  int            no_copy_xattrs;
>  static char    *bad_blocks_filename = NULL;
> @@ -1012,6 +1013,11 @@ static void parse_extended_opts(struct ext2_super_block *param,
>                                 lazy_itable_init = strtoul(arg, &p, 0);
>                         else
>                                 lazy_itable_init = 1;
> +               } else if (!strcmp(token, "assume_storage_prezeroed")) {
> +                       if (arg)
> +                               assume_storage_prezeroed = strtoul(arg, &p, 0);
> +                       else
> +                               assume_storage_prezeroed = 1;
>                 } else if (!strcmp(token, "lazy_journal_init")) {
>                         if (arg)
>                                 journal_flags |= strtoul(arg, &p, 0) ?
> @@ -1115,7 +1121,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
>                         "\tnodiscard\n"
>                         "\tencoding=<encoding>\n"
>                         "\tencoding_flags=<flags>\n"
> -                       "\tquotatype=<quota type(s) to be enabled>\n\n"),
> +                       "\tquotatype=<quota type(s) to be enabled>\n"
> +                       "\tassume_storage_prezeroed=<0 to disable, 1 to enable>\n\n"),
>                         badopt ? badopt : "");
>                 free(buf);
>                 exit(1);
> @@ -3095,6 +3102,18 @@ int main (int argc, char *argv[])
>                 io_channel_set_options(fs->io, opt_string);
>         }
>
> +       if (assume_storage_prezeroed) {
> +               if (verbose)
> +                       printf("%s",
> +                              _("Assuming the storage device is prezeroed "
> +                              "- skipping inode table and journal wipe\n"));
> +
> +               lazy_itable_init = 1;
> +               itable_zeroed = 1;
> +               zero_hugefile = 0;
> +               journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
> +       }
> +
>         /* Can't undo discard ... */
>         if (!noaction && discard && dev_size && (io_ptr != undo_io_manager)) {
>                 retval = mke2fs_discard_device(fs);
> diff --git a/tests/m_assume_storage_prezeroed/expect b/tests/m_assume_storage_prezeroed/expect
> new file mode 100644
> index 00000000..2ca3784a
> --- /dev/null
> +++ b/tests/m_assume_storage_prezeroed/expect
> @@ -0,0 +1,2 @@
> +2384
> +336
> diff --git a/tests/m_assume_storage_prezeroed/script b/tests/m_assume_storage_prezeroed/script
> new file mode 100644
> index 00000000..0745fb28
> --- /dev/null
> +++ b/tests/m_assume_storage_prezeroed/script
> @@ -0,0 +1,31 @@
> +test_description="test prezeroed storage metadata allocation"
> +FILE_SIZE=16M
> +
> +LOG=$test_name.log
> +OUT=$test_name.out
> +EXP=$test_dir/expect
> +
> +dd if=/dev/zero of=$TMPFILE.1 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1
> +dd if=/dev/zero of=$TMPFILE.2 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1
> +
> +$MKE2FS -o Linux -t ext4 -O has_journal $TMPFILE.1 >> $LOG 2>&1
> +stat -c "%b" $TMPFILE.1 > $OUT
> +
> +$MKE2FS -o Linux -t ext4 -O has_journal -E assume_storage_prezeroed=1 $TMPFILE.2 >> $LOG 2>&1
> +stat -c "%b" $TMPFILE.2 >> $OUT
> +
> +rm -f $TMPFILE.1 $TMPFILE.2
> +
> +cmp -s $OUT $EXP
> +status=$?
> +
> +if [ "$status" = 0 ] ; then
> +       echo "$test_name: $test_description: ok"
> +       touch $test_name.ok
> +else
> +       echo "$test_name: $test_description: failed"
> +       cat $LOG > $test_name.failed
> +       diff $EXP $OUT >> $test_name.failed
> +fi
> +
> +unset LOG OUT EXP FILE_SIZE
> \ No newline at end of file
> --
> 2.31.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ