lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240511211111.GA8330@mit.edu>
Date: Sat, 11 May 2024 17:11:11 -0400
From: "Theodore Ts'o" <tytso@....edu>
To: Johannes Schauer Marin Rodrigues <josch@...ter-muffin.de>
Cc: linux-ext4@...r.kernel.org
Subject: Re: created ext4 disk image differs depending on the underlying
 filesystem

On Sat, May 11, 2024 at 07:34:42AM +0200, Johannes Schauer Marin Rodrigues wrote:
>  2. allow resetting fs->super->s_kbytes_written to zero. This patch worked for
>     me:
> 
> Would you be happy about a patch for (2.)? If yes, I can send something over
> once I find some time. :)
> 

I'm currently going back and forth about whether we should just (a)
unconditionally set s_kbytes_writes to zero before we write out the
superblock, or (b) whether we add a new extended operation, or (c) do
a hack where if SOURCE_DATE_EPOCH is set, use that as implied "set
s_kbytes_written to zero since the user is probably caring about a
reproducible file system".

(c) is a bit hacky, but it's the most convenient for users, and adding
Yet Another extended operation.

Related to this is the design question about whether SOURCE_DATE_EPOCH
should imply using a fixed value for s_uuid and s_hash_seed.  Again,
it's a little weird to overload SOURCE_DATE_EPOCH to setting the uuid
and hash_seed to some fixed value, which might be a time-based UUID
with the ethernet address set to all zeroes, or some other fixed
value.  But it's a pretty good proxy of what the user wants, and if
this is this is the default, the user can always override it via an
extended option if they really want something different.

If it weren't for the fact that I'm considering have SOURCE_DATE_EPOCH
provide default values for s_uuid and s_hash_seed, I'd be tempted to just
unconditionally set the s_kbytes_written to zero.

I'm curious what your opinions might be on this, as someone who might
want to use this feature.

> As an end-user I am very interested in keeping the functionality of mke2fs
> which keeps track of which parts are actually sparse and which ones are not.
> This functionality can be used with tools like "bmaptool" (a more clever dd) to
> only copy those parts of the image to the flash drive which are actually
> supposed to contain data.

If the file system where the image is created supports either the
FIEMAP ioctl or fallocate SEEK_HOLE, then "bmaptool create" can figure
out which parts of the file is sparse, so we don't need to make any
changes to e2fsprogs.  If the file system doesn't support FIEMAP or
SEEK_HOLE, one could imagine that bmaptool could figure out which
parts of the file could be sparse simply by looking for blocks that
are all zeroes.  This is basically what "cp --sparse=always" or what
the attached make-sparse.c file does to determine where the holes could be.

Yes, I could imagine adding a new io_manager much like test_io and
undo_io which tracked which blocks had been written, and then would
write out a BMAP file.  However, the vast majority of constructed file
systems are quite small, so simply reading all of the blocks to
determine which blocks were all zeroes ala cp --sparse=always isn't
going to invole all that much overhead.  And I'd argue the right thing
to do would be to teach bmaptool how to do what cp --sparse=always so
that the same interface regardless of whether bmaptool is running on a
modern file system that supports FIEMAP or SEEK_HOLE, or some legacy
file system like FAT16 or FAT32.

Cheers,

						- Ted
/*
 * make-sparse.c --- make a sparse file from stdin
 * 
 * Copyright 2004 by Theodore Ts'o.
 *
 * %Begin-Header%
 * This file may be redistributed under the terms of the GNU Public
 * License.
 * %End-Header%
 */

#define _LARGEFILE_SOURCE
#define _LARGEFILE64_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int full_read(int fd, char *buf, size_t count)
{
	int got, total = 0;
	int pass = 0;

	while (count > 0) {
		got = read(fd, buf, count);
		if (got == -1) {
			if ((errno == EINTR) || (errno == EAGAIN)) 
				continue;
			return total ? total : -1;
		}
		if (got == 0) {
			if (pass++ >= 3)
				return total;
			continue;
		}
		pass = 0;
		buf += got;
		total += got;
		count -= got;
	}
	return total;
}

int main(int argc, char **argv)
{
	int fd, got, i;
	int zflag = 0;
	char buf[1024];

	if (argc != 2) {
		fprintf(stderr, "Usage: make-sparse out-file\n");
		exit(1);
	}
	fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0777);
	if (fd < 0) {
		perror(argv[1]);
		exit(1);
	}
	while (1) {
		got = full_read(0, buf, sizeof(buf));
		if (got == 0)
			break;
		if (got == sizeof(buf)) {
			for (i=0; i < sizeof(buf); i++) 
				if (buf[i])
					break;
			if (i == sizeof(buf)) {
				lseek(fd, sizeof(buf), SEEK_CUR);
				zflag = 1;
				continue;
			}
		}
		zflag = 0;
		write(fd, buf, got);
	}
	if (zflag) {
		lseek(fd, -1, SEEK_CUR);
		buf[0] = 0;
		write(fd, buf, 1);
	}
	return 0;
}
		

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ