linux-ext4 - Re: [RFC PATCH v1 00/30] fs: inode->i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1490898932.2667.1.camel@redhat.com>
Date:   Thu, 30 Mar 2017 14:35:32 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     "J. Bruce Fields" <bfields@...ldses.org>
Cc:     Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@...radead.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-btrfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and
 optimization

On Thu, 2017-03-30 at 12:12 -0400, J. Bruce Fields wrote:
> On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
> > > Hum, so are we fine if i_version just changes (increases) for all inodes
> > > after a server crash? If I understand its use right, it would mean
> > > invalidation of all client's caches but that is not such a big deal given
> > > how frequent server crashes should be, right?
> 
> Even if it's rare, it may be really painful when all your clients are
> forced to throw out and repopulate their caches after a crash.  But,
> yes, maybe we can live with it.
> 

Yeah, assuming that normal reboots wouldn't cause this, then I don't see
it as being too bad.

> > > Because if above is acceptable we could make reported i_version to be a sum
> > > of "superblock crash counter" and "inode i_version". We increment
> > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
> > > That way after a crash we are guaranteed each inode will report new
> > > i_version (the sum would probably have to look like "superblock crash
> > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
> > > i_version numbers we gave away but did not write to disk but still...).
> > > Thoughts?
> 
> How hard is this for filesystems to support?  Do they need an on-disk
> format change to keep track of the crash counter?  Maybe not, maybe the
> high bits of the i_version counters are all they need.
> 

Yeah, I imagine we'd need a on-disk change for this unless there's
something already present that we could use in place of a crash counter.

> > That does sound like a good idea. This is a 64 bit value, so we should
> > be able to carve out some upper bits for a crash counter without risking
> > wrapping.
> > 
> > The other constraint here is that we'd like any later version of the
> > counter to be larger than any earlier value that was handed out. I think
> > this idea would still satisfy that.
> 
> I guess we just want to have some back-of-the-envelope estimates of
> maximum number of i_version increments possible between crashes and
> maximum number of crashes possible over lifetime of a filesystem, to
> decide how to split up the bits.
> 
> I wonder if we could get away with using the new crash counter only for
> *new* values of the i_version?  After a crash, use the on disk i_version
> as is, and put off using the new crash counter until the next time the
> file's modified.
> 

That sounds difficult to get right. Suppose I have an inode that has not
been updated in a long time. Someone writes to it and then queries the
i_version. How do I know whether there were crashes since the last time
I updated it? Or am I misunderstanding what you're proposing here?

> That would still eliminate the risk of accidental reuse of an old
> i_version value.  It still leaves some cases where the client could fail
> to notice an update indefinitely.  All these cases I think have to
> assume that a writer made some changes that it failed to ever sync, so
> as long as we care only about close-to-open semantics perhaps those
> cases don't matter.
> 
> I wonder if repeated crashes can lead to any odd corner cases.
> 

-- 
Jeff Layton <jlayton@...hat.com>