linux-ext4 - Re: [RFC PATCH v1 00/30] fs: inode->i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87bmsbiqzz.fsf@notabene.neil.brown.name>
Date:   Wed, 05 Apr 2017 11:26:40 +1000
From:   NeilBrown <neil@...wn.name>
To:     Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>
Cc:     "J. Bruce Fields" <bfields@...ldses.org>,
        Jeff Layton <jlayton@...hat.com>,
        Christoph Hellwig <hch@...radead.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-btrfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization

On Tue, Apr 04 2017, Dave Chinner wrote:

> On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote:
>> On Sun 02-04-17 09:05:26, Dave Chinner wrote:
>> > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote:
>> > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
>> > > > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
>> > > > > Because if above is acceptable we could make reported i_version to be a sum
>> > > > > of "superblock crash counter" and "inode i_version". We increment
>> > > > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
>> > > > > That way after a crash we are guaranteed each inode will report new
>> > > > > i_version (the sum would probably have to look like "superblock crash
>> > > > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
>> > > > > i_version numbers we gave away but did not write to disk but still...).
>> > > > > Thoughts?
>> > > 
>> > > How hard is this for filesystems to support?  Do they need an on-disk
>> > > format change to keep track of the crash counter?
>> > 
>> > Yes. We'll need version counter in the superblock, and we'll need to
>> > know what the increment semantics are. 
>> > 
>> > The big question is how do we know there was a crash? The only thing
>> > a journalling filesystem knows at mount time is whether it is clean
>> > or requires recovery. Filesystems can require recovery for many
>> > reasons that don't involve a crash (e.g. root fs is never unmounted
>> > cleanly, so always requires recovery). Further, some filesystems may
>> > not even know there was a crash at mount time because their
>> > architecture always leaves a consistent filesystem on disk (e.g. COW
>> > filesystems)....
>> 
>> What filesystems can or cannot easily do obviously differs. Ext4 has a
>> recovery flag set in superblock on RW mount/remount and cleared on
>> umount/RO remount.
>
> Even this doesn't help. A recent bug that was reported to the XFS
> list - turns out that systemd can't remount-ro the root
> filesystem sucessfully on shutdown because there are open write fds
> on the root filesystem when it attempts the remount. So it just
> reboots without a remount-ro. This uncovered a bug in grub in

Filesystems could use register_reboot_notifier() to get a notification
that even systemd cannot stuff-up.  It could check for dirty data and, if
there is none (which there shouldn't be if a sync happened), it does a
single write to disk to update the superblock (or a single write to each
disk... or something).
md does this, because getting the root device to be marked read-only is
even harder than getting the root filesystem to be remounted read-only.

NeilBrown

Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)