lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <B9F63236-3539-492E-83EB-3377836C2FDB@dilger.ca>
Date:	Sun, 1 Nov 2015 18:16:50 -0700
From:	Andreas Dilger <adilger@...ger.ca>
To:	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Corruption from interrupted e2fsck

It looks there is a bug in how e2fsck handles being interrupted by CTRL-C.
If CTRL-C is pressed to kill e2fsck rather than e.g. kill -9, then the
interrupt handler sets E2F_FLAG_CANCEL in the context but doesn't actually
kill the process.  Instead, e2fsck_pass1() checks this flag before processing
the next inode.

If a filesystem is running in fix mode (e2fsck -fy) is interrupted, and the
quota feature is enabled, then the quota file will still be written to disk
even though the inode scan was not complete and the quota information is
totally inaccurate.  Even worse, if the Pass 1 inode and block scan was not
finished, then the in-memory block bitmaps (which are used for block
allocation during e2fsck) are also invalid, so any blocks allocated to the
quota files may corrupt other files if those blocks were actually used.

It also looks like the journal may also be recreated after e2fsck is
interrupted, if it was deleted during pass 1 because of corruption.

static void signal_cancel(int sig EXT2FS_ATTR((unused)))
{
        e2fsck_t ctx = e2fsck_global_ctx;

        if (!ctx)
                exit(FSCK_CANCELED);

        ctx->flags |= E2F_FLAG_CANCEL;
}


	sa.sa_handler = signal_cancel;
	sigaction(SIGINT, &sa, 0);
	sigaction(SIGTERM, &sa, 0);
	:
	:
        run_result = e2fsck_run(ctx);
        e2fsck_clear_progbar(ctx);

        if (!ctx->invalid_bitmaps &&
            (ctx->flags & E2F_FLAG_JOURNAL_INODE)) {
		if (fix_problem(ctx, PR_6_RECREATE_JOURNAL, &pctx)) {
			:
			:
			retval = ext2fs_add_journal_inode(fs, journal_size, 0);
		}
	}

no_journal:
	if (ctx->qctx) {
		for (i = 0; i < MAXQUOTAS; i++) {
			retval = quota_compare_and_update(ctx->qctx, i, &needs_writeout);
		}
	}

	if (run_result & E2F_FLAG_ABORT)
		fatal_error(ctx, _("aborted"));

Is there a reason not to have a cancel check right after the return from
e2fsck_run() rather than trying to recover the journal and quota files?
I can imagine that there is a desire to flush out modified inodes and such
that have been repaired, so that restarting an interrupted e2fsck will make
progress, but the quota file update is plain wrong unless at least pass1
has finished, and the journal recreation is also dangerous if the block
bitmaps have not been fully updated.

The quota problem was hit in on a system, but the journal problem is only a
theory at this point.  I'm working on a patch but wanted to solicit input in case there is something that I'm missing.

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ