linux-ext4 - Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <8822E97B-ABDC-428A-84DF-FC3CBBDCC7AF@dilger.ca>
Date:   Fri, 9 Mar 2018 12:40:17 -0700
From:   Andreas Dilger <adilger@...ger.ca>
To:     Lukas Czerner <lczerner@...hat.com>
Cc:     "Darrick J. Wong" <darrick.wong@...cle.com>, tytso@....edu,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts

On Mar 9, 2018, at 6:31 AM, Lukas Czerner <lczerner@...hat.com> wrote:
> 
> On Thu, Mar 08, 2018 at 12:13:27PM -0800, Darrick J. Wong wrote:
>> From: Darrick J. Wong <darrick.wong@...cle.com>
>> 
>> Implement online fsck for ext* filesystems which live on LVM-managed
>> logical volumes.  The basic strategy mirrors that of e2croncheck --
>> create a snapshot, fsck the snapshot, report whatever errors appear,
>> remove snapshot.  Unlike e2croncheck, this utility accepts any LVM
>> device path, knows about snapshots running out of space, and can call
>> fstrim having validated that the fs metadata is ok.
> 
> Thanks Darrick this looks good, I have couple of comments below. Also
> it's not a big deal, but I have a slight issue with the name e2scrub.
> What we will call it if/when we have a real in-kernel fs scrub ?

I'd think in that case the e2scrub script will call the in-kernel version
of the scrub?  That might even be "e2fsck" just calling an ioctl, like
resize2fs decides whether to do an online or offline resize depending if
the filesystem is mounted or not.

>> diff --git a/scrub/e2scrub.8.in b/scrub/e2scrub.8.in
>> new file mode 100644
>> index 0000000..7cb6dfb
>> --- /dev/null
>> +++ b/scrub/e2scrub.8.in
>> @@ -0,0 +1,34 @@
>> +.TH E2SCRUB 8 "@E2FSPROGS_MONTH@ @E2FSPROGS_YEAR@" "E2fsprogs version @E2FSPROGS_VERSION@"
>> +.SH NAME
>> +e2scrub - check a mounted ext2/ext3/ext4 file system on an LVM volume for errors.
>> +.SH SYNOPSYS
>> +.B
>> +e2scrub [OPTION] [LVM DEVICE PATH]
>> +.SH DESCRIPTION
>> +Given a live file system on a LVM volume, this program snapshots the
>> +logical volume and runs a file system check to look for serious errors.
>> +If no errors are found, fstrim can be called on the mounted file system.
>> +However, if errors are found, the file system should be unmounted and
>> +fixed.
> 
> e2scrub will _not_ fix you file system, I think we should make this part
> really clear.
> 
> I am missing some more information on what are the requirements for this to
> work - like free space in the vg.
> 
> How about few words about disaster recovery, I know you're trying to
> always remove the snapshot, but what if it fails for whatever reason ? I
> think there are things users should know about so that they can help
> themselves.

Later on in the patch series there is a command to reap stale snapshots
via cron or systemd once a day.

> Also in case this is run on dm-thin lv it's worth mentioning that lvm
> will use the "old" snaphost instead if the thin snapshot. However this
> is something we can recognize and use the right command.
> 
>> +.SH OPTIONS
>> +.TP
>> +\fB-t\fR
>> +Run
>> +.B
>> +fstrim(1)
>> +on the mounted filesystem if no errors are found.
>> +.TP
>> +\fB-V\fR
>> +Print version information and exit.
>> +.SH EXIT CODE
>> +The exit codes are the same as in
>> +.BR e2fsck (8)
>> +.SH BUGS
>> +This utility is capable of checking any ext* filesystem on an LVM volume,
>> +regardless of whether it is mounted.
> 
> If it's not mounted it'll still use lvm snapshots right ? It's not
> necessarily clear from the documentation.

I think yes, which is safest in case the filesystem suddenly is mounted
in the middle of the scrub.

>> +.SH SEE ALSO
>> +.BR e2fsck (8)
>> +.SH AUTHOR
>> +Darrick J. Wong <darrick.wong@...cle.com>
>> +.SH COPYRIGHT
>> +Copyright ©2018 Darrick J. Wong.  License is GPLv2+. <http://www.gnu.org/licenses/gpl-2.0.html>
>> diff --git a/scrub/e2scrub.conf.in b/scrub/e2scrub.conf.in
>> new file mode 100644
>> index 0000000..fec828a
>> --- /dev/null
>> +++ b/scrub/e2scrub.conf.in
>> @@ -0,0 +1,10 @@
>> +# e2scrub configuration file
>> +
>> +# Snapshots will be created to run fsck; the snapshot will be of this size.
>> +# snap_size_mb=256
> 
> Do you have any idea what it takes to fill this up ? What kind of
> workload for how long ? Why do you think it's a sane default ?
> 
> I am well aware that there is no right answer I am just curious.

I've used 256MB in my lvcheck scripts for years without issues.  It
really depends on how long the e2fsck takes and how much churn there
is in the filesystem during that time.  Reserving a bit too much is not
really harmful, since the space is only used temporarily.

What might be difficult is that users will typically consume all of
the space in the VG from the beginning.  It would be great if distros
reserved a few hundred MB in the VG at installation time for this.

>> diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in
>> new file mode 100644
>> index 0000000..e26b449
>> --- /dev/null
>> +++ b/scrub/e2scrub.in
>> @@ -0,0 +1,162 @@
>> +#!/bin/bash
>> +
>> +#  Copyright (C) 2018 Oracle.  All Rights Reserved.
>> +#
>> +#  Author: Darrick J. Wong <darrick.wong@...cle.com>
>> +#
>> +#  This program is free software; you can redistribute it and/or
>> +#  modify it under the terms of the GNU General Public License
>> +#  as published by the Free Software Foundation; either version 2
>> +#  of the License, or (at your option) any later version.
>> +#
>> +#  This program is distributed in the hope that it would be useful,
>> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +#  GNU General Public License for more details.
>> +#
>> +#  You should have received a copy of the GNU General Public License
>> +#  along with this program; if not, write the Free Software Foundation,
>> +#  Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
>> +
>> +# Automatically check a LVM-managed filesystem online.
>> +# We use lvm snapshots to do this, which means that we can only
>> +# check filesystems in VGs that have at least 256mb (or so) of
>> +# free space.
>> +
>> +snap_size_mb=256
>> +fstrim=0
>> +e2fsck_opts=""
>> +conffile="@root_sysconfdir@...scrub.conf"
>> +
>> +test -f "${conffile}" && . "${conffile}"
>> +
>> +print_help() {
>> +	echo "Usage: $0 [-t] device"
>> +	echo
>> +	echo "device must be a LVM-managed block device"
>> +	echo "-t: Run fstrim if successful."
>> +}
>> +
>> +print_version() {
>> +	echo "e2scrub @E2FSPROGS_VERSION@ (@E2FSPROGS_DATE@)"
>> +}
>> +
>> +while getopts "tV" opt; do
>> +	case "${opt}" in
>> +	"t") fstrim=1;;
>> +	"V") print_version; exit 0;;
>> +	*) print_help; exit 2;;
>> +	esac
>> +done
>> +shift "$((OPTIND - 1))"
>> +
>> +dev="$1"
>> +if [ -z "${dev}" ]; then
>> +	print_help
>> +	exit 1
>> +elif [ ! -b "${dev}" ]; then
>> +	echo "${dev}: Not a block device?"
>> +	print_help
>> +	exit 16
>> +fi
>> +
>> +# Make sure this is an LVM device we can snapshot
>> +vg="$(lvs --noheadings -o vg_name "${dev}" 2> /dev/null | sed -e 's/^  //g')"
>> +lv="$(lvs --noheadings -o lv_name "${dev}" 2> /dev/null | sed -e 's/^  //g')"
> 
> You could use
> 
> eval $(lvs --nameprefixes -o name,vgname --noheadings "${dev}")
> 
> and have lvm set up $LVM2_VG_NAME and $LVM2_LV_NAME for you
> 
>> +if [ -z "${vg}" ] || [ -z "${lv}" ]; then
>> +	echo "${dev}: Not a LVM device."
>> +	exit 16
>> +fi
>> +start_time="$(date +'%Y%m%d%H%M%S')"
>> +snap="${lv}.e2scrub"
> 
> Let's hope people are not creating these intentionally. Maybe using a
> timestamp might be worthwhile ?
> 
>> +snap_dev="/dev/${vg}/${snap}"
>> +fstype="$(blkid -p -s TYPE "${dev}" | sed -e 's/^.*TYPE="\(.*\)".*$/\1/g')"
> 
> blkid -o value -p -s TYPE "${dev}"
> 
>> +
>> +case "${fstype}" in
>> +"ext2"|"ext3"|"ext4")
> 
> ext[234]
> 
>> +	;;
>> +*)
>> +	echo "${dev}: Filesystem of type ${fstype} not supported."
>> +	exit 16
>> +	;;
>> +esac
>> +
>> +teardown() {
>> +	# Remove and wait for removal to succeed.
>> +	lvremove -f "${vg}/${snap}" 3>&-
>> +	while [ -b "${snap_dev}" ] && [ "$?" -eq "5" ]; do
>> +		sleep 0.5
>> +		lvremove -f "${vg}/${snap}" 3>&-
>> +	done
>> +}
>> +
>> +check() {
>> +	# First we recover the journal, then we see if e2fsck tries any
>> +	# non-optimization repairs.  If either of these two returns a
>> +	# non-zero status (errors fixed or remaining) then this fs is bad.
>> +	E2FSCK_FIXES_ONLY=1
>> +	export E2FSCK_FIXES_ONLY
>> +	${DBG} "@root_sbindir@...fsck" -E journal_only -p ${e2fsck_opts} "${snap_dev}" || return 1
>> +	${DBG} "@root_sbindir@...fsck" -fy ${e2fsck_opts} "${snap_dev}" || return 1
> 
> Why to use -y ? Wouldn't it be better to just run with -p ?
> 
>> +	return 0
>> +}
>> +
>> +mark_clean() {
>> +	${DBG} "@root_sbindir@...ne2fs" -C 0 -T "${start_time}" "${dev}"
> 
> I am not sure about -C 0. I understand that it's there to elliminate the
> mount-count based fsck, but it also means the mount-count is not going
> to be reliable, not sure if that matters much.

IMHO, with the e2scrub functionality in place it would make sense to
reinstate the time- and/or mount-based e2fsck.  That would catch the case
if e2scrub or the cron/systemd jobs that run it were broken for some
reason and hadn't checked the filesystem in months.

>> +}
>> +
>> +mark_corrupt() {
>> +	${DBG} "@root_sbindir@...ne2fs" -E force_fsck "${dev}"
>> +}
>> +
>> +setup() {
>> +	# Create the snapshot, wait for device to appear
>> +	teardown > /dev/null 2> /dev/null
>> +	lvcreate -s -L "${snap_size_mb}m" -n "${snap}" "${vg}/${lv}" 3>&-
>> +	test $? -ne 0 && return 1
>> +	udevadm settle 2> /dev/null
>> +	return 0
>> +}
>> +
>> +trap "teardown; exit 1" EXIT INT QUIT TERM
>> +if ! setup; then
>> +	echo "Snapshot of ${dev} FAILED, will not check!"
>> +	exit 1
>> +fi
>> +
>> +# Check and react
>> +if check; then
>> +	echo "Scrub of ${dev} succeeded."
>> +	mark_clean
>> +
>> +	if [ "${fstrim}" -eq 1 ] && type fstrim > /dev/null 2>&1; then
>> +		dir="$(lsblk -o MOUNTPOINT -n "${dev}")"
>> +		if [ -d "${dir}" ]; then
>> +			# NB: fstrim fails with snapshot present
>> +			trap '' EXIT
>> +			teardown
>> +			fstrim -v "${dir}"
> 
> This can take a long time, how about some information for the user what
> we're doing ?

This will typically be run via cron/systemd, so you don't want it to be
too verbose.

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)