linux-ext4 - Re: New service e2scrub

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190318233238.GE4936@magnolia>
Date:   Mon, 18 Mar 2019 16:32:38 -0700
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Paul Menzel <pmenzel@...gen.mpg.de>
Cc:     "Theodore Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org
Subject: Re: New service e2scrub_reap

On Mon, Mar 18, 2019 at 11:03:59PM +0100, Paul Menzel wrote:
> Dear Ted,
> 
> 
> On 18.03.19 22:47, Theodore Ts'o wrote:
> > On Mon, Mar 18, 2019 at 12:24:55PM +0100, Paul Menzel wrote:
> 
> > > On Debian Sid/unstable, I noticed the new service `scrub/e2scrub_reap.service`
> > > installed in the default target [1][2].
> > > 
> > > $ systemctl status -o short-precise e2scrub_reap.service
> > > ● e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots
> > >     Loaded: loaded (/lib/systemd/system/e2scrub_reap.service; enabled; vendor preset: enabled)
> > >     Active: inactive (dead) since Mon 2019-03-18 12:17:13 CET; 1min 1s ago
> > >       Docs: man:e2scrub_all(8)
> > >    Process: 447 ExecStart=/sbin/e2scrub_all -A -r (code=exited, status=0/SUCCESS)
> > >   Main PID: 447 (code=exited, status=0/SUCCESS)
> > > 
> > > Mar 18 12:17:08.223560 plumpsklo systemd[1]: Starting Remove Stale Online ext4 Metadata Check Snapshots...
> > > Mar 18 12:17:13.996465 plumpsklo systemd[1]: e2scrub_reap.service: Succeeded.
> > > Mar 18 12:17:13.996808 plumpsklo systemd[1]: Started Remove Stale Online ext4 Metadata Check Snapshots.
> > 
> > Yeah, that's unfortunate.  I'm seeing a similar time on my (fairly
> > high-end) laptop:
> > 
> > # time e2scrub_all -A -r
> > 
> > real	0m4.356s
> > user	0m0.677s
> > sys	0m1.285s
> 
> Thank you for your response and tests.
> 
> > We should be able to fix this in general by avoiding the use of lsblk
> > at all, and in the case of e2scrub -r, just simply iterating over the
> > output of:
> > 
> > lvs --name-prefixes -o vg_name,lv_name,lv_path,origin -S lv_role=snapshot
> > 
> > (which takes about a fifth of a second on my laptop and it should be
> > even faster if there are no LVM volumes on the system)
> > 
> > And without the -r option, we should just be able to do this:
> > 
> > lvs --name-prefixes -o vg_name,lv_name,lv_path -S lv_active=active,lv_role=public
> > 
> > Right now we're calling lvs for every single block device emitted by
> > lsblk, and from what I can tell, we can do a much better job
> > optimizing e2scrub_all.
> 
> Indeed. That sounds like a way to improve the situation.

That's ... interesting.  On my developer workstations (Ubuntu 16.04 and
18.04) it generally takes 1/10th the amount of time to run
e2scrub_all.

Even on my aging ~2010 era server that only has disks it takes 0.3s:

# time e2scrub_all -A -r

real    0m0.280s
user    0m0.160s
sys     0m0.126s

I wonder what's different between our computers?  Do you have a
lvm2-lvmetad service running?

However, since e2scrub is tied to lvm, Ted is right that calling lvs in
the outer loop would be far more efficient.  I'll have a look at
reworking this.

> > > Reading the manual, the switch `-r` “removes e2scrub snapshots but do not
> > > check anything”.
> > > 
> > > Does this have to be done during boot-up, or could it be done after the
> > > default target was reached, or even during shutting down?
> > 
> > This shouldn't be blocking any other targets, I think there should be
> > a way to configure the unit file so that it runs in parallel with the
> > other systemd units.  My systemd-fu is not super strong, so I'll have
> > to do some investigating to see how we can fix this.
> 
> Sorry about my wording. It’s not about blocking targets, but an additional
> program which fights for the resources. Until the graphical target (or
> graphical login manager) is reached on my system, a lot of process already
> wait for CPU resources. That is the bottleneck during the boot-up of my
> system.
> 
> So it’d be great, if services, which actually do not have to run during
> boot-up would only be started after the default target has been reached.
> Something like the ordering dependency
> 
>     After=default.target
> 
> which does not work though to my knowledge. I’ll ask the systemd folks
> again.

The biggest risk of delaying that is that the system will crash while
the root fs was being scrubbed and then the snapshot will run out of
space while the rest of the system comes back up.  However, this service
can run in parallel with the other tasks; there's no need for it to run
solo.

--D

>
> 
> Kind regards,
> 
> Paul