linux-ext4 - Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7c71050e139a479e08ab7cf95e9e47da19a30687.camel@kernel.org>
Date:   Mon, 12 Sep 2022 08:55:04 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     Florian Weimer <fweimer@...hat.com>
Cc:     "J. Bruce Fields" <bfields@...ldses.org>,
        Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
        NeilBrown <neilb@...e.de>, adilger.kernel@...ger.ca,
        djwong@...nel.org, david@...morbit.com, trondmy@...merspace.com,
        viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com,
        chuck.lever@...cle.com, lczerner@...hat.com, brauner@...nel.org,
        linux-man@...r.kernel.org, linux-api@...r.kernel.org,
        linux-btrfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, ceph-devel@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-nfs@...r.kernel.org,
        linux-xfs@...r.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Mon, 2022-09-12 at 14:13 +0200, Florian Weimer wrote:
> * Jeff Layton:
> 
> > To do this we'd need 2 64-bit fields in the on-disk and in-memory 
> > superblocks for ext4, xfs and btrfs. On the first mount after a crash,
> > the filesystem would need to bump s_version_max by the significant
> > increment (2^40 bits or whatever). On a "clean" mount, it wouldn't need
> > to do that.
> > 
> > Would there be a way to ensure that the new s_version_max value has made
> > it to disk? Bumping it by a large value and hoping for the best might be
> > ok for most cases, but there are always outliers, so it might be
> > worthwhile to make an i_version increment wait on that if necessary. 
> 
> How common are unclean shutdowns in practice?  Do ex64/XFS/btrfs keep
> counters in the superblocks for journal replays that can be read easily?
> 
> Several useful i_version applications could be negatively impacted by
> frequent i_version invalidation.
> 

One would hope "not very often", but Oopses _are_ something that happens
occasionally, even in very stable environments, and it would be best if
what we're building can cope with them. Consider:

reader				writer
----------------------------------------------------------
start with i_version 1
				inode updated in memory, i_version++
query, get i_version 2

 <<< CRASH : update never makes it to disk, back at 1 after reboot >>>

query, get i_version 1
				application restarts and redoes write, i_version at 2^40+1
query, get i_version 2^40+1 

The main thing we have to avoid here is giving out an i_version that
represents two different states of the same inode. This should achieve
that.

Something else we should consider though is that with enough crashes on
a long-lived filesystem, the value could eventually wrap. I think we
should acknowledge that fact in advance, and plan to deal with it
(particularly if we're going to expose this to userland eventually).

Because of the "seen" flag, we have a 63 bit counter to play with. Could
we use a similar scheme to the one we use to handle when "jiffies"
wraps? Assume that we'd never compare two values that were more than
2^62 apart? We could add i_version_before/i_version_after macros to make
it simple to handle this.
-- 
Jeff Layton <jlayton@...nel.org>