linux-kernel - Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <166267685105.30452.17324304715046746056@noble.neil.brown.name>
Date:   Fri, 09 Sep 2022 08:40:51 +1000
From:   "NeilBrown" <neilb@...e.de>
To:     "Theodore Ts'o" <tytso@....edu>
Cc:     "Jan Kara" <jack@...e.cz>, "Jeff Layton" <jlayton@...nel.org>,
        "J. Bruce Fields" <bfields@...ldses.org>, adilger.kernel@...ger.ca,
        djwong@...nel.org, david@...morbit.com, trondmy@...merspace.com,
        viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com,
        chuck.lever@...cle.com, lczerner@...hat.com, brauner@...nel.org,
        fweimer@...hat.com, linux-man@...r.kernel.org,
        linux-api@...r.kernel.org, linux-btrfs@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        ceph-devel@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Fri, 09 Sep 2022, Theodore Ts'o wrote:
> On Thu, Sep 08, 2022 at 10:33:26AM +0200, Jan Kara wrote:
> > It boils down to the fact that we don't want to call mark_inode_dirty()
> > from IOCB_NOWAIT path because for lots of filesystems that means journal
> > operation and there are high chances that may block.
> > 
> > Presumably we could treat inode dirtying after i_version change similarly
> > to how we handle timestamp updates with lazytime mount option (i.e., not
> > dirty the inode immediately but only with a delay) but then the time window
> > for i_version inconsistencies due to a crash would be much larger.
> 
> Perhaps this is a radical suggestion, but there seems to be a lot of
> the problems which are due to the concern "what if the file system
> crashes" (and so we need to worry about making sure that any
> increments to i_version MUST be persisted after it is incremented).
> 
> Well, if we assume that unclean shutdowns are rare, then perhaps we
> shouldn't be optimizing for that case.  So.... what if a file system
> had a counter which got incremented each time its journal is replayed
> representing an unclean shutdown.  That shouldn't happen often, but if
> it does, there might be any number of i_version updates that may have
> gotten lost.  So in that case, the NFS client should invalidate all of
> its caches.

I was also thinking that the filesystem could help close that gap, but I
didn't like the "whole filesysem is dirty" approach.
I instead imagined a "dirty" bit in the on-disk inode which was set soon
after any open-for-write and cleared when the inode was finally written
after there are no active opens and no unflushed data.
The "soon after" would set a maximum window on possible lost version
updates (which people seem to have comfortable with) without imposing a
sync IO operation on open (for first write).

When loading an inode from disk, if the dirty flag was set then the
difference between current time and on-disk ctime (in nanoseconds) could
be added to the version number.

But maybe that is too complex for the gain.

NeilBrown