linux-ext4 - Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 12 Mar 2018 06:50:09 +0100
From:   Lukas Czerner <lczerner@...hat.com>
To:     Andreas Dilger <adilger@...ger.ca>
Cc:     "Darrick J. Wong" <darrick.wong@...cle.com>, tytso@....edu,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts

On Fri, Mar 09, 2018 at 01:40:01PM -0700, Andreas Dilger wrote:
> On Mar 9, 2018, at 1:18 PM, Lukas Czerner <lczerner@...hat.com> wrote:
> > 
> > On Fri, Mar 09, 2018 at 12:40:17PM -0700, Andreas Dilger wrote:
> >> On Mar 9, 2018, at 6:31 AM, Lukas Czerner <lczerner@...hat.com> wrote:
> >>> 
> >>> On Thu, Mar 08, 2018 at 12:13:27PM -0800, Darrick J. Wong wrote:
> >>>> From: Darrick J. Wong <darrick.wong@...cle.com>
> >>>> 
> >>>> Implement online fsck for ext* filesystems which live on LVM-managed
> >>>> logical volumes.  The basic strategy mirrors that of e2croncheck --
> >>>> create a snapshot, fsck the snapshot, report whatever errors appear,
> >>>> remove snapshot.  Unlike e2croncheck, this utility accepts any LVM
> >>>> device path, knows about snapshots running out of space, and can call
> >>>> fstrim having validated that the fs metadata is ok.
> >>> 
> >>> Thanks Darrick this looks good, I have couple of comments below. Also
> >>> it's not a big deal, but I have a slight issue with the name e2scrub.
> >>> What we will call it if/when we have a real in-kernel fs scrub ?
> >> 
> >> I'd think in that case the e2scrub script will call the in-kernel version
> >> of the scrub?  That might even be "e2fsck" just calling an ioctl, like
> >> resize2fs decides whether to do an online or offline resize depending if
> >> the filesystem is mounted or not.
> > 
> > Yeah, I suppose.
> > 
> >> 
> >>>> diff --git a/scrub/e2scrub.8.in b/scrub/e2scrub.8.in
> >>>> new file mode 100644
> >>>> index 0000000..7cb6dfb
> >>>> --- /dev/null
> >>>> +++ b/scrub/e2scrub.8.in
> >>>> @@ -0,0 +1,34 @@
> >>>> +.TH E2SCRUB 8 "@E2FSPROGS_MONTH@ @E2FSPROGS_YEAR@" "E2fsprogs version @E2FSPROGS_VERSION@"
> >>>> +.SH NAME
> >>>> +e2scrub - check a mounted ext2/ext3/ext4 file system on an LVM volume for errors.
> >>>> +.SH SYNOPSYS
> >>>> +.B
> >>>> +e2scrub [OPTION] [LVM DEVICE PATH]
> >>>> +.SH DESCRIPTION
> >>>> +Given a live file system on a LVM volume, this program snapshots the
> >>>> +logical volume and runs a file system check to look for serious errors.
> >>>> +If no errors are found, fstrim can be called on the mounted file system.
> >>>> +However, if errors are found, the file system should be unmounted and
> >>>> +fixed.
> >>> 
> >>> e2scrub will _not_ fix you file system, I think we should make this part
> >>> really clear.
> >>> 
> >>> I am missing some more information on what are the requirements for this to
> >>> work - like free space in the vg.
> >>> 
> >>> How about few words about disaster recovery, I know you're trying to
> >>> always remove the snapshot, but what if it fails for whatever reason ? I
> >>> think there are things users should know about so that they can help
> >>> themselves.
> >> 
> >> Later on in the patch series there is a command to reap stale snapshots
> >> via cron or systemd once a day.
> > 
> > True, but it might not be in place and regardless of how it's set up it
> > usefull to document what do to if anything goes wrong. As it is now the
> > documentation is really light on details of what is it actually doing.
> 
> Yeah, the lvscan script that I had would delete the old snapshot if it
> found one, after checking that there was not an older scan still running
> on the device.
> 
> One option would be to consolidate the "e2scrub_reap" functionality into
> e2scrub itself, and then allow "e2scrub -n" or "e2scrub --reap" to be
> used to only do the cleanup?  That way, at least the next e2scrub run
> will clean up the old snapshot rather than generating a useless error,
> or running a check on an old (stale) copy of the snapshot.

Yes, I would like the option to run e2scrub to do the cleanup.

-Lukas

> 
> That would also mean one less script to be installed and documented.
> 
> Cheers, Andreas
> 
> >>> Also in case this is run on dm-thin lv it's worth mentioning that lvm
> >>> will use the "old" snaphost instead if the thin snapshot. However this
> >>> is something we can recognize and use the right command.
> >>> 
> >>>> +.SH OPTIONS
> >>>> +.TP
> >>>> +\fB-t\fR
> >>>> +Run
> >>>> +.B
> >>>> +fstrim(1)
> >>>> +on the mounted filesystem if no errors are found.
> >>>> +.TP
> >>>> +\fB-V\fR
> >>>> +Print version information and exit.
> >>>> +.SH EXIT CODE
> >>>> +The exit codes are the same as in
> >>>> +.BR e2fsck (8)
> >>>> +.SH BUGS
> >>>> +This utility is capable of checking any ext* filesystem on an LVM volume,
> >>>> +regardless of whether it is mounted.
> >>> 
> >>> If it's not mounted it'll still use lvm snapshots right ? It's not
> >>> necessarily clear from the documentation.
> >> 
> >> I think yes, which is safest in case the filesystem suddenly is mounted
> >> in the middle of the scrub.
> > 
> > Definitelly, but that needs to be documented.
> > 
> >> 
> >>>> +.SH SEE ALSO
> >>>> +.BR e2fsck (8)
> >>>> +.SH AUTHOR
> >>>> +Darrick J. Wong <darrick.wong@...cle.com>
> >>>> +.SH COPYRIGHT
> >>>> +Copyright ©2018 Darrick J. Wong.  License is GPLv2+. <http://www.gnu.org/licenses/gpl-2.0.html>
> >>>> diff --git a/scrub/e2scrub.conf.in b/scrub/e2scrub.conf.in
> >>>> new file mode 100644
> >>>> index 0000000..fec828a
> >>>> --- /dev/null
> >>>> +++ b/scrub/e2scrub.conf.in
> >>>> @@ -0,0 +1,10 @@
> >>>> +# e2scrub configuration file
> >>>> +
> >>>> +# Snapshots will be created to run fsck; the snapshot will be of this size.
> >>>> +# snap_size_mb=256
> >>> 
> >>> Do you have any idea what it takes to fill this up ? What kind of
> >>> workload for how long ? Why do you think it's a sane default ?
> >>> 
> >>> I am well aware that there is no right answer I am just curious.
> >> 
> >> I've used 256MB in my lvcheck scripts for years without issues.  It
> >> really depends on how long the e2fsck takes and how much churn there
> >> is in the filesystem during that time.  Reserving a bit too much is not
> >> really harmful, since the space is only used temporarily.
> > 
> > Good to know.
> > 
> >> 
> >> What might be difficult is that users will typically consume all of
> >> the space in the VG from the beginning.  It would be great if distros
> >> reserved a few hundred MB in the VG at installation time for this.
> > 
> > That's very much true, I think lvm people can tell a lot of stories about
> > this one :)
> > 
> >> 
> >>>> diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in
> >>>> new file mode 100644
> >>>> index 0000000..e26b449
> >>>> --- /dev/null
> >>>> +++ b/scrub/e2scrub.in
> >>>> @@ -0,0 +1,162 @@
> >>>> +#!/bin/bash
> >>>> +
> >>>> +#  Copyright (C) 2018 Oracle.  All Rights Reserved.
> >>>> +#
> >>>> +#  Author: Darrick J. Wong <darrick.wong@...cle.com>
> >>>> +#
> >>>> +#  This program is free software; you can redistribute it and/or
> >>>> +#  modify it under the terms of the GNU General Public License
> >>>> +#  as published by the Free Software Foundation; either version 2
> >>>> +#  of the License, or (at your option) any later version.
> >>>> +#
> >>>> +#  This program is distributed in the hope that it would be useful,
> >>>> +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> +#  GNU General Public License for more details.
> >>>> +#
> >>>> +#  You should have received a copy of the GNU General Public License
> >>>> +#  along with this program; if not, write the Free Software Foundation,
> >>>> +#  Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> >>>> +
> >>>> +# Automatically check a LVM-managed filesystem online.
> >>>> +# We use lvm snapshots to do this, which means that we can only
> >>>> +# check filesystems in VGs that have at least 256mb (or so) of
> >>>> +# free space.
> >>>> +
> >>>> +snap_size_mb=256
> >>>> +fstrim=0
> >>>> +e2fsck_opts=""
> >>>> +conffile="@root_sysconfdir@...scrub.conf"
> >>>> +
> >>>> +test -f "${conffile}" && . "${conffile}"
> >>>> +
> >>>> +print_help() {
> >>>> +	echo "Usage: $0 [-t] device"
> >>>> +	echo
> >>>> +	echo "device must be a LVM-managed block device"
> >>>> +	echo "-t: Run fstrim if successful."
> >>>> +}
> >>>> +
> >>>> +print_version() {
> >>>> +	echo "e2scrub @E2FSPROGS_VERSION@ (@E2FSPROGS_DATE@)"
> >>>> +}
> >>>> +
> >>>> +while getopts "tV" opt; do
> >>>> +	case "${opt}" in
> >>>> +	"t") fstrim=1;;
> >>>> +	"V") print_version; exit 0;;
> >>>> +	*) print_help; exit 2;;
> >>>> +	esac
> >>>> +done
> >>>> +shift "$((OPTIND - 1))"
> >>>> +
> >>>> +dev="$1"
> >>>> +if [ -z "${dev}" ]; then
> >>>> +	print_help
> >>>> +	exit 1
> >>>> +elif [ ! -b "${dev}" ]; then
> >>>> +	echo "${dev}: Not a block device?"
> >>>> +	print_help
> >>>> +	exit 16
> >>>> +fi
> >>>> +
> >>>> +# Make sure this is an LVM device we can snapshot
> >>>> +vg="$(lvs --noheadings -o vg_name "${dev}" 2> /dev/null | sed -e 's/^  //g')"
> >>>> +lv="$(lvs --noheadings -o lv_name "${dev}" 2> /dev/null | sed -e 's/^  //g')"
> >>> 
> >>> You could use
> >>> 
> >>> eval $(lvs --nameprefixes -o name,vgname --noheadings "${dev}")
> >>> 
> >>> and have lvm set up $LVM2_VG_NAME and $LVM2_LV_NAME for you
> >>> 
> >>>> +if [ -z "${vg}" ] || [ -z "${lv}" ]; then
> >>>> +	echo "${dev}: Not a LVM device."
> >>>> +	exit 16
> >>>> +fi
> >>>> +start_time="$(date +'%Y%m%d%H%M%S')"
> >>>> +snap="${lv}.e2scrub"
> >>> 
> >>> Let's hope people are not creating these intentionally. Maybe using a
> >>> timestamp might be worthwhile ?
> >>> 
> >>>> +snap_dev="/dev/${vg}/${snap}"
> >>>> +fstype="$(blkid -p -s TYPE "${dev}" | sed -e 's/^.*TYPE="\(.*\)".*$/\1/g')"
> >>> 
> >>> blkid -o value -p -s TYPE "${dev}"
> >>> 
> >>>> +
> >>>> +case "${fstype}" in
> >>>> +"ext2"|"ext3"|"ext4")
> >>> 
> >>> ext[234]
> >>> 
> >>>> +	;;
> >>>> +*)
> >>>> +	echo "${dev}: Filesystem of type ${fstype} not supported."
> >>>> +	exit 16
> >>>> +	;;
> >>>> +esac
> >>>> +
> >>>> +teardown() {
> >>>> +	# Remove and wait for removal to succeed.
> >>>> +	lvremove -f "${vg}/${snap}" 3>&-
> >>>> +	while [ -b "${snap_dev}" ] && [ "$?" -eq "5" ]; do
> >>>> +		sleep 0.5
> >>>> +		lvremove -f "${vg}/${snap}" 3>&-
> >>>> +	done
> >>>> +}
> >>>> +
> >>>> +check() {
> >>>> +	# First we recover the journal, then we see if e2fsck tries any
> >>>> +	# non-optimization repairs.  If either of these two returns a
> >>>> +	# non-zero status (errors fixed or remaining) then this fs is bad.
> >>>> +	E2FSCK_FIXES_ONLY=1
> >>>> +	export E2FSCK_FIXES_ONLY
> >>>> +	${DBG} "@root_sbindir@...fsck" -E journal_only -p ${e2fsck_opts} "${snap_dev}" || return 1
> >>>> +	${DBG} "@root_sbindir@...fsck" -fy ${e2fsck_opts} "${snap_dev}" || return 1
> >>> 
> >>> Why to use -y ? Wouldn't it be better to just run with -p ?
> >>> 
> >>>> +	return 0
> >>>> +}
> >>>> +
> >>>> +mark_clean() {
> >>>> +	${DBG} "@root_sbindir@...ne2fs" -C 0 -T "${start_time}" "${dev}"
> >>> 
> >>> I am not sure about -C 0. I understand that it's there to elliminate the
> >>> mount-count based fsck, but it also means the mount-count is not going
> >>> to be reliable, not sure if that matters much.
> >> 
> >> IMHO, with the e2scrub functionality in place it would make sense to
> >> reinstate the time- and/or mount-based e2fsck.  That would catch the case
> >> if e2scrub or the cron/systemd jobs that run it were broken for some
> >> reason and hadn't checked the filesystem in months.
> > 
> > If the cron/systemd jobs are set up then it might make sense maybe. But
> > again after fs unmount the mount-count might be 0, not sure if that
> > matters to people ar all.
> > 
> >> 
> >>>> +}
> >>>> +
> >>>> +mark_corrupt() {
> >>>> +	${DBG} "@root_sbindir@...ne2fs" -E force_fsck "${dev}"
> >>>> +}
> >>>> +
> >>>> +setup() {
> >>>> +	# Create the snapshot, wait for device to appear
> >>>> +	teardown > /dev/null 2> /dev/null
> >>>> +	lvcreate -s -L "${snap_size_mb}m" -n "${snap}" "${vg}/${lv}" 3>&-
> >>>> +	test $? -ne 0 && return 1
> >>>> +	udevadm settle 2> /dev/null
> >>>> +	return 0
> >>>> +}
> >>>> +
> >>>> +trap "teardown; exit 1" EXIT INT QUIT TERM
> >>>> +if ! setup; then
> >>>> +	echo "Snapshot of ${dev} FAILED, will not check!"
> >>>> +	exit 1
> >>>> +fi
> >>>> +
> >>>> +# Check and react
> >>>> +if check; then
> >>>> +	echo "Scrub of ${dev} succeeded."
> >>>> +	mark_clean
> >>>> +
> >>>> +	if [ "${fstrim}" -eq 1 ] && type fstrim > /dev/null 2>&1; then
> >>>> +		dir="$(lsblk -o MOUNTPOINT -n "${dev}")"
> >>>> +		if [ -d "${dir}" ]; then
> >>>> +			# NB: fstrim fails with snapshot present
> >>>> +			trap '' EXIT
> >>>> +			teardown
> >>>> +			fstrim -v "${dir}"
> >>> 
> >>> This can take a long time, how about some information for the user what
> >>> we're doing ?
> >> 
> >> This will typically be run via cron/systemd, so you don't want it to be
> >> too verbose.
> > 
> > But it does not necessarily have to be run from cron. Then it'll say
> > 
> > Scrub of ${dev} succeeded.
> > 
> > ... then hangs for minutes or potentially even longer, that's not good.
> > 
> > 
> > Remember, if we get it into e2fsprogs it'll get to distributions and
> > people will start using and experimenting with it. People that not
> > necessarily know much about lvm or fstrim. It needs to be well documented
> > and verbose enough for people to at least have some idea of what's going on.
> > 
> > Especially since I think it might attract some attention, if nothing
> > else then just because btrfs has scrub ;)
> > 
> > Regards,
> > -Lukas
> > 
> >> 
> >> Cheers, Andreas
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
>