lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 13 Jan 2008 14:05:42 +0300
From:	Al Boldi <a1426z@...ab.com>
To:	Theodore Tso <tytso@....EDU>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFD] Incremental fsck

Theodore Tso wrote:
> On Wed, Jan 09, 2008 at 02:52:14PM +0300, Al Boldi wrote:
> > Ok, but let's look at this a bit more opportunistic / optimistic.
> >
> > Even after a black-out shutdown, the corruption is pretty minimal, using
> > ext3fs at least.
>
> After a unclean shutdown, assuming you have decent hardware that
> doesn't lie about when blocks hit iron oxide, you shouldn't have any
> corruption at all.  If you have crappy hardware, then all bets are off....

Maybe with barriers...

> > So let's take advantage of this fact and do an optimistic fsck, to
> > assure integrity per-dir, and assume no external corruption.  Then
> > we release this checked dir to the wild (optionally ro), and check
> > the next.  Once we find external inconsistencies we either fix it
> > unconditionally, based on some preconfigured actions, or present the
> > user with options.
>
> So what can you check?  The *only* thing you can check is whether or
> not the directory syntax looks sane, whether the inode structure looks
> sane, and whether or not the blocks reported as belong to an inode
> looks sane.

Which would make this dir/area ready for read/write access.

> What is very hard to check is whether or not the link count on the
> inode is correct.  Suppose the link count is 1, but there are actually
> two directory entries pointing at it.  Now when someone unlinks the
> file through one of the directory hard entries, the link count will go
> to zero, and the blocks will start to get reused, even though the
> inode is still accessible via another pathname.  Oops.  Data Loss.

We could buffer this, and only actually overwrite when we are completely 
finished with the fsck.

> This is why doing incremental, on-line fsck'ing is *hard*.  You're not
> going to find this while doing each directory one at a time, and if
> the filesystem is changing out from under you, it gets worse.  And
> it's not just the hard link count.  There is a similar issue with the
> block allocation bitmap.  Detecting the case where two files are
> simultaneously can't be done if you are doing it incrementally, and if
> the filesystem is changing out from under you, it's impossible, unless
> you also have the filesystem telling you every single change while it
> is happening, and you keep an insane amount of bookkeeping.

Ok, you have a point, so how about we change the implementation detail a bit, 
from external fsck to internal fsck, leveraging the internal fs bookkeeping, 
while allowing immediate but controlled read/write access.


Thanks for more thoughts!

--
Al

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ