linux-ext4 - Re: [Lsf-pc] [LSF/MM TOPIC] Use generic FS in virtual environments challenges and solutions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87r47pk2nf.fsf@openvz.org>
Date:	Thu, 30 Jan 2014 17:41:40 +0400
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Jan Kara <jack@...e.cz>
Cc:	lsf-pc@...ts.linux-foundation.org, linux-fsdevel@...r.kernel.org,
	linux-ext4@...r.kernel.org,
	Konstantin Khorenko <khorenko@...allels.com>,
	Pavel Emelianov <xemul@...allels.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] Use generic FS in virtual environments challenges and solutions

On Thu, 30 Jan 2014 11:05:35 +0100, Jan Kara <jack@...e.cz> wrote:
> On Thu 30-01-14 11:51:20, Dmitry Monakhov wrote:
> > > >    B) Reduce fsck time. Theodore Tso have announced initiative to implement
> > > >       ffck for ext4 [3]. I want to discuss perspectives of design and
> > > >       implementation online fsck for ext4.
> > >   Well, this comes up every once in a while and the answer is always the
> > > same. Checking might be reasonably doable but comes almost for free when
> > > using LVM snapshots and doing fsck on the snapshot. Fixing read-write
> > > filesystem - good luck.
> > But. What what about merging data from fixed snapshot back to original image?
> > 
> > ---time-axis------------------------------------------------->
> > FS0----[Error]---[write-new-data]----------------->X????
> >          |                                         |
> > FS0-snap \-----[start fsck]-----[errors corrected]-/
> > Obviously there are no way how we can merge fixed snapshot to modified filesystem
>   Yes, snapshots are good only for read-only checks. If they find errors,
> you have to bite the bullet, unmount the fs and run fsck. However fsck
> finding errors should be rare enough, or do you have other experience?
Well, most of errors we observed was caused by instability in block-layer.
But we have faced law of large numbers effect, in our case each HW node has
100-1000 containers, each container has didicated fsimage so number of
errors are not neglectable.
> 
> > So the only option we have after we have discovered error on FS0-snap is
> > to umount FS0 and run fsck on it. As result we double disk load, and
> > still have big downtime, but what if error was relatively simple (wrong
> > group stats, or wrong i_blocks for inode) it is possible to fix it
> > online. My proposal is to start a discussion about list issues which can be
> > fixed online.
>   The trouble is that to reliably check even such simple thing as group
> stats or i_blocks, you have to freeze all modifications to the group /
> inode, make kernel flush all its internal state for these objects, check +
> fix them, make kernel reread the new info, and unfreeze these objects. So a
> lot of work for even the simplest fixes and it's not clear to me why people
> should hit fs corruption often enough to warrant the complications.
> 
> There are also other guys who want to be able to make some groups not
> available for allocation so if we spot some inconsistency in group metadata,
> we simply won't do allocation from it anymore and then run fsck to fix the
> damage during scheduled downtime. That is much easier to implement and
> approach like this should go a long way towards making corrupted filesystem
> still usable.
That looks reasonable. 
> 
> 								Honza
> -- 
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html