lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Oct 2021 00:25:54 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Sarthak Kukreti <sarthakkukreti@...omium.org>
Cc:     linux-ext4@...r.kernel.org, adilger@...ger.ca,
        gwendal@...omium.org, okiselev@...zon.com
Subject: Re: [PATCH v2] mke2fs: Add extended option for prezeroed storage
 devices

I tried running the regression test, and it was failing for me; it
showed that even with -E assume_stoarge_prezeroed, the size of the
$TMPFILE.1 and $TMPFILE.2 was the same.  Looking into this, it was
because in lib/ext2fs/unix_io.c, when the file is a plain file
io_channel_discard_zeroes_data() returns true, since it assumes that
we can use PUNCH_HOLE to implement unix_io_discard(), which is
guaranteed to work.

So I had to change the regression test to use losetup, which also
meant that the test had to run as root....

Anyway, this is what I've checked into e2fsprogs.

      	       	    	       	  - Ted

commit bd2e72c5c5521b561d20a881c843a64a5832721a
Author: Sarthak Kukreti <sarthakkukreti@...omium.org>
Date:   Mon Sep 27 03:39:10 2021 -0700

    mke2fs: add extended option for prezeroed storage devices
    
    This patch adds an extended option "assume_storage_prezeroed" to
    mke2fs. When enabled, this option acts as a hint to mke2fs that the
    underlying block device was zeroed before mke2fs was called.  This
    allows mke2fs to optimize out the zeroing of the inode table and the
    journal, which speeds up the filesystem creation time.
    
    Additionally, on thinly provisioned storage devices (like Ceph,
    dm-thin, newly created sparse loopback files), reads on unmapped
    extents return zero. This property allows mke2fs (with
    assume_storage_prezeroed) to avoid pre-allocating metadata space for
    inode tables for the entire filesystem and saves space that would
    normally be preallocated for zero inode tables.
    
    Tests
    -----
    1) Running 'mke2fs -t ext4' on 10G sparse files on an ext4
    filesystem drops the time taken by mke2fs from 0.09s to 0.04s
    and reduces the initial metadata space allocation (stat on
    sparse file) from 139736 blocks (545M) to 8672 blocks (34M).
    
    2) On ChromeOS (running linux kernel 4.19) with dm-thin
    and 200GB thin logical volumes using 'mke2fs -t ext4 <dev>':
    
    - Time taken by mke2fs drops from 1.07s to 0.08s.
    - Avoiding zeroing out the inode table and journal reduces the
      initial metadata space allocation from 0.48% to 0.01%.
    - Lazy inode table zeroing results in a further 1.45% of logical
      volume space getting allocated for inode tables, even if no file
      data is added to the filesystem. With assume_storage_prezeroed,
      the metadata allocation remains at 0.01%.
    
    [ Fixed regression test to work on newer versions of e2fsprogs -- TYT ]
    
    Signed-off-by: Sarthak Kukreti <sarthakkukreti@...omium.org>
    Signed-off-by: Theodore Ts'o <tytso@....edu>

diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in
index b378e4d7..30f97bb5 100644
--- a/misc/mke2fs.8.in
+++ b/misc/mke2fs.8.in
@@ -365,6 +365,13 @@ small risk if the system crashes before the journal has been overwritten
 entirely one time.  If the option value is omitted, it defaults to 1 to
 enable lazy journal inode zeroing.
 .TP
+.B assume_storage_prezeroed\fR[\fB= \fI<0 to disable, 1 to enable>\fR]
+If enabled,
+.BR mke2fs
+assumes that the storage device has been prezeroed, skips zeroing the journal
+and inode tables, and annotates the block group flags to signal that the inode
+table has been zeroed.
+.TP
 .B no_copy_xattrs
 Normally
 .B mke2fs
diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index c955b318..76b8b8c6 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -96,6 +96,7 @@ int	journal_flags;
 int	journal_fc_size;
 static e2_blkcnt_t	orphan_file_blocks;
 static int	lazy_itable_init;
+static int	assume_storage_prezeroed;
 static int	packed_meta_blocks;
 int		no_copy_xattrs;
 static char	*bad_blocks_filename = NULL;
@@ -1013,6 +1014,11 @@ static void parse_extended_opts(struct ext2_super_block *param,
 				lazy_itable_init = strtoul(arg, &p, 0);
 			else
 				lazy_itable_init = 1;
+		} else if (!strcmp(token, "assume_storage_prezeroed")) {
+			if (arg)
+				assume_storage_prezeroed = strtoul(arg, &p, 0);
+			else
+				assume_storage_prezeroed = 1;
 		} else if (!strcmp(token, "lazy_journal_init")) {
 			if (arg)
 				journal_flags |= strtoul(arg, &p, 0) ?
@@ -1131,7 +1137,8 @@ static void parse_extended_opts(struct ext2_super_block *param,
 			"\tnodiscard\n"
 			"\tencoding=<encoding>\n"
 			"\tencoding_flags=<flags>\n"
-			"\tquotatype=<quota type(s) to be enabled>\n\n"),
+			"\tquotatype=<quota type(s) to be enabled>\n"
+			"\tassume_storage_prezeroed=<0 to disable, 1 to enable>\n\n"),
 			badopt ? badopt : "");
 		free(buf);
 		exit(1);
@@ -3125,6 +3132,18 @@ int main (int argc, char *argv[])
 		io_channel_set_options(fs->io, opt_string);
 	}
 
+	if (assume_storage_prezeroed) {
+		if (verbose)
+			printf("%s",
+			       _("Assuming the storage device is prezeroed "
+			       "- skipping inode table and journal wipe\n"));
+
+		lazy_itable_init = 1;
+		itable_zeroed = 1;
+		zero_hugefile = 0;
+		journal_flags |= EXT2_MKJOURNAL_LAZYINIT;
+	}
+
 	/* Can't undo discard ... */
 	if (!noaction && discard && dev_size && (io_ptr != undo_io_manager)) {
 		retval = mke2fs_discard_device(fs);
diff --git a/tests/m_assume_storage_prezeroed/expect b/tests/m_assume_storage_prezeroed/expect
new file mode 100644
index 00000000..b735e242
--- /dev/null
+++ b/tests/m_assume_storage_prezeroed/expect
@@ -0,0 +1,2 @@
+> 10000
+224
diff --git a/tests/m_assume_storage_prezeroed/script b/tests/m_assume_storage_prezeroed/script
new file mode 100644
index 00000000..1a8d8463
--- /dev/null
+++ b/tests/m_assume_storage_prezeroed/script
@@ -0,0 +1,63 @@
+test_description="test prezeroed storage metadata allocation"
+FILE_SIZE=16M
+
+LOG=$test_name.log
+OUT=$test_name.out
+EXP=$test_dir/expect
+
+if test "$(id -u)" -ne 0 ; then
+    echo "$test_name: $test_description: skipped (not root)"
+elif ! command -v losetup >/dev/null ; then
+    echo "$test_name: $test_description: skipped (no losetup)"
+else
+    dd if=/dev/zero of=$TMPFILE.1 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1
+    dd if=/dev/zero of=$TMPFILE.2 bs=1 count=0 seek=$FILE_SIZE >> $LOG 2>&1
+
+    LOOP1=$(losetup --show --sector-size 4096 -f $TMPFILE.1)
+    if [ ! -b "$LOOP1" ]; then
+        echo "$test_name: $DESCRIPTION: skipped (no loop devices)"
+        rm -f $TMPFILE.1 $TMPFILE.2
+        exit 0
+    fi
+    LOOP2=$(losetup --show --sector-size 4096 -f $TMPFILE.2)
+    if [ ! -b "$LOOP2" ]; then
+        echo "$test_name: $DESCRIPTION: skipped (no loop devices)"
+        rm -f $TMPFILE.1 $TMPFILE.2
+	losetup -d $LOOP1
+        exit 0
+    fi
+
+    echo $MKE2FS -o Linux -t ext4 $LOOP1 >> $LOG 2>&1
+    $MKE2FS -o Linux -t ext4 $LOOP1 >> $LOG 2>&1
+    sync
+    stat $TMPFILE.1 >> $LOG 2>&1
+    SZ=$(stat -c "%b" $TMPFILE.1)
+    if test $SZ -gt 10000 ; then
+	echo "> 10000" > $OUT
+    else
+	echo "$SZ" > $OUT
+    fi
+
+    echo $MKE2FS -o Linux -t ext4 -E assume_storage_prezeroed=1 $LOOP2 >> $LOG 2>&1
+    $MKE2FS -o Linux -t ext4 -E assume_storage_prezeroed=1 $LOOP2 >> $LOG 2>&1
+    sync
+    stat $TMPFILE.2 >> $LOG 2>&1
+    stat -c "%b" $TMPFILE.2 >> $OUT
+
+    losetup -d $LOOP1
+    losetup -d $LOOP2
+    rm -f $TMPFILE.1 $TMPFILE.2
+
+    cmp -s $OUT $EXP
+    status=$?
+
+    if [ "$status" = 0 ] ; then
+	echo "$test_name: $test_description: ok"
+	touch $test_name.ok
+    else
+	echo "$test_name: $test_description: failed"
+	cat $LOG > $test_name.failed
+	diff $EXP $OUT >> $test_name.failed
+    fi
+fi
+unset LOG OUT EXP FILE_SIZE LOOP1 LOOP2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ