[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <166267685105.30452.17324304715046746056@noble.neil.brown.name>
Date: Fri, 09 Sep 2022 08:40:51 +1000
From: "NeilBrown" <neilb@...e.de>
To: "Theodore Ts'o" <tytso@....edu>
Cc: "Jan Kara" <jack@...e.cz>, "Jeff Layton" <jlayton@...nel.org>,
"J. Bruce Fields" <bfields@...ldses.org>, adilger.kernel@...ger.ca,
djwong@...nel.org, david@...morbit.com, trondmy@...merspace.com,
viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com,
chuck.lever@...cle.com, lczerner@...hat.com, brauner@...nel.org,
fweimer@...hat.com, linux-man@...r.kernel.org,
linux-api@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
ceph-devel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
STATX_INO_VERSION field
On Fri, 09 Sep 2022, Theodore Ts'o wrote:
> On Thu, Sep 08, 2022 at 10:33:26AM +0200, Jan Kara wrote:
> > It boils down to the fact that we don't want to call mark_inode_dirty()
> > from IOCB_NOWAIT path because for lots of filesystems that means journal
> > operation and there are high chances that may block.
> >
> > Presumably we could treat inode dirtying after i_version change similarly
> > to how we handle timestamp updates with lazytime mount option (i.e., not
> > dirty the inode immediately but only with a delay) but then the time window
> > for i_version inconsistencies due to a crash would be much larger.
>
> Perhaps this is a radical suggestion, but there seems to be a lot of
> the problems which are due to the concern "what if the file system
> crashes" (and so we need to worry about making sure that any
> increments to i_version MUST be persisted after it is incremented).
>
> Well, if we assume that unclean shutdowns are rare, then perhaps we
> shouldn't be optimizing for that case. So.... what if a file system
> had a counter which got incremented each time its journal is replayed
> representing an unclean shutdown. That shouldn't happen often, but if
> it does, there might be any number of i_version updates that may have
> gotten lost. So in that case, the NFS client should invalidate all of
> its caches.
I was also thinking that the filesystem could help close that gap, but I
didn't like the "whole filesysem is dirty" approach.
I instead imagined a "dirty" bit in the on-disk inode which was set soon
after any open-for-write and cleared when the inode was finally written
after there are no active opens and no unflushed data.
The "soon after" would set a maximum window on possible lost version
updates (which people seem to have comfortable with) without imposing a
sync IO operation on open (for first write).
When loading an inode from disk, if the dirty flag was set then the
difference between current time and on-disk ctime (in nanoseconds) could
be added to the version number.
But maybe that is too complex for the gain.
NeilBrown
Powered by blists - more mailing lists