[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180315171036.GD13424@magnolia>
Date: Thu, 15 Mar 2018 10:10:36 -0700
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: Andreas Dilger <adilger@...ger.ca>
Cc: tytso@....edu, lczerner@...hat.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/4] e2scrub: create online fsck tool of sorts
On Wed, Mar 14, 2018 at 09:23:18PM -0700, Darrick J. Wong wrote:
> On Wed, Mar 14, 2018 at 10:03:15PM -0600, Andreas Dilger wrote:
> > On Mar 14, 2018, at 12:17 AM, Darrick J. Wong <darrick.wong@...cle.com> wrote:
> > >
> > > From: Darrick J. Wong <darrick.wong@...cle.com>
> > >
> > > Implement online fsck for ext* filesystems which live on LVM-managed
> > > logical volumes. The basic strategy mirrors that of e2croncheck --
> > > create a snapshot, fsck the snapshot, report whatever errors appear,
> > > remove snapshot. Unlike e2croncheck, this utility accepts any LVM
> > > device path, knows about snapshots running out of space, and can call
> > > fstrim having validated that the fs metadata is ok.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@...cle.com>
> > >
> > > diff --git a/scrub/e2scrub.in b/scrub/e2scrub.in
> > > new file mode 100644
> > > index 0000000..647f0e6
> > > --- /dev/null
> > > +++ b/scrub/e2scrub.in
> > > @@ -0,0 +1,207 @@
> > > +#!/bin/bash
> > > +
> > > +# Copyright (C) 2018 Oracle. All Rights Reserved.
> > > +#
> > > +# Author: Darrick J. Wong <darrick.wong@...cle.com>
> > > +#
> > > +# This program is free software; you can redistribute it and/or
> > > +# modify it under the terms of the GNU General Public License
> > > +# as published by the Free Software Foundation; either version 2
> > > +# of the License, or (at your option) any later version.
> > > +#
> > > +# This program is distributed in the hope that it would be useful,
> > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > > +# GNU General Public License for more details.
> > > +#
> > > +# You should have received a copy of the GNU General Public License
> > > +# along with this program; if not, write the Free Software Foundation,
> > > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
> >
> > I think it is preferred to visit http://www.gnu.org/licenses/gpl-2.0.html
> > since this snail mail address has changed in the past, and it is unlikely
> > that anyone would use it in any case.
> >
> > > +# Automatically check a LVM-managed filesystem online.
> > > +# We use lvm snapshots to do this, which means that we can only
> > > +# check filesystems in VGs that have at least 256mb (or so) of
> >
> > s/mb/MB/
>
> Ok.
>
> > > +# Make sure this is an LVM device we can snapshot
> > > +lvm_vars="$(lvs --nameprefixes -o name,vgname,lv_role --noheadings "${dev}" 2> /dev/null)"
> > > +eval "${lvm_vars}"
> > > +if [ -z "${LVM2_VG_NAME}" ] || [ -z "${LVM2_LV_NAME}" ] ||
> > > + echo "${LVM2_LV_ROLE}" | grep -q "snapshot"; then
> > > + echo "${dev}: Not a LVM logical volume."
> > > + print_help
> > > + exit 16
> > > +fi
> > > +start_time="$(date +'%Y%m%d%H%M%S')"
> > > +snap="${LVM2_LV_NAME}.e2scrub"
> > > +snap_dev="/dev/${LVM2_VG_NAME}/${snap}"
> > > +
> > > +teardown() {
> > > + # Remove and wait for removal to succeed.
> > > + ${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-
> >
> > It isn't clear to me what fd 3 is for in these commands?
>
> For whatever reason, lvm tools complain about leaked file descriptors if
> fd 3 is open, and systemd and cron will sometimes feed it such a thing.
>
> > > + while [ -e "${snap_dev}" ] && [ "$?" -eq "5" ]; do
> > > + sleep 0.5
> > > + ${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-
> > > + done
> >
> > This while loop could be slightly restructured to avoid multiple lvremove
> > commands, like:
> >
> > teardown() {
> > # Remove and wait for removal to succeed.
> > while [ -e "${snap_dev}" ] &&
> > [ `${DBG} lvremove -f "${LVM2_VG_NAME}/${snap}" 3>&-` -eq "5" ]; do
>
> But that's not equivalent. The patch runs lvremove and compares the
> return value to 5, whereas this captures the stdout of lvremove and
> compares the stdout data to 5.
>
> > sleep 0.5
> > done
> > }
> >
> > That said, should this fail after some number of retries? What if there
> > is another e2scrub running on this device keeping it busy? Should that
> > be checked separately?
>
> There's a small window in which concurrent e2scrubs can interfere with each
> other (one creates the snapshot and goes to e2fsck while the other one
> tears it down). It's a pity there isn't a way to tell lvm to create a
> O_TMPFILE like snapshot, feed it to e2fsck, and have it automatically
> disappear when the fd closes.
>
> TBH I was intending this to run as an automatic background systemd
> service, which provides the necessary isolation without having to go
> figure out pid files for non-systemd systems. I guess we can talk
> tomorrow about this assuming there's an ext4 call...
...which we did, the results of which are that I'll rework the lvremove
loop at startup to bail out after a few retries.
--D
> --D
>
> > Cheers, Andreas
> >
> >
> >
> >
> >
>
>
Powered by blists - more mailing lists