[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220923095653.5c63i2jgv52j3zqp@quack3>
Date: Fri, 23 Sep 2022 11:56:53 +0200
From: Jan Kara <jack@...e.cz>
To: Jeff Layton <jlayton@...nel.org>
Cc: Dave Chinner <david@...morbit.com>, Theodore Ts'o <tytso@....edu>,
NeilBrown <neilb@...e.de>,
Trond Myklebust <trondmy@...merspace.com>,
"bfields@...ldses.org" <bfields@...ldses.org>,
"zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
"djwong@...nel.org" <djwong@...nel.org>,
"brauner@...nel.org" <brauner@...nel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"fweimer@...hat.com" <fweimer@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>,
"linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"jack@...e.cz" <jack@...e.cz>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"xiubli@...hat.com" <xiubli@...hat.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
"lczerner@...hat.com" <lczerner@...hat.com>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
STATX_INO_VERSION field
On Thu 22-09-22 16:18:02, Jeff Layton wrote:
> On Thu, 2022-09-22 at 06:18 -0400, Jeff Layton wrote:
> > On Thu, 2022-09-22 at 07:41 +1000, Dave Chinner wrote:
> > > e.g. The NFS server can track the i_version values when the NFSD
> > > syncs/commits a given inode. The nfsd can sample i_version it when
> > > calls ->commit_metadata or flushed data on the inode, and then when
> > > it peeks at i_version when gathering post-op attrs (or any other
> > > getattr op) it can decide that there is too much in-memory change
> > > (e.g. 10,000 counts since last sync) and sync the inode.
> > >
> > > i.e. the NFS server can trivially cap the maximum number of
> > > uncommitted NFS change attr bumps it allows to build up in memory.
> > > At that point, the NFS server has a bound "maximum write count" that
> > > can be used in conjunction with the xattr based crash counter to
> > > determine how the change_attr is bumped by the crash counter.
> >
> > Well, not "trivially". This is the bit where we have to grow struct
> > inode (or the fs-specific inode), as we'll need to know what the latest
> > on-disk value is for the inode.
> >
> > I'm leaning toward doing this on the query side. Basically, when nfsd
> > goes to query the i_version, it'll check the delta between the current
> > version and the latest one on disk. If it's bigger than X then we'd just
> > return NFS4ERR_DELAY to the client.
> >
> > If the delta is >X/2, maybe it can kick off a workqueue job or something
> > that calls write_inode with WB_SYNC_ALL to try to get the thing onto the
> > platter ASAP.
>
> Still looking at this bit too. Probably we can just kick off a
> WB_SYNC_NONE filemap_fdatawrite at that point and hope for the best?
"Hope" is not a great assurance regarding data integrity ;) Anyway, it
depends on how you imagine the "i_version on disk" is going to be
maintained. It could be maintained by NFSD inside commit_inode_metadata() -
fetch current i_version value before asking filesystem for the sync and by the
time commit_metadata() returns we know that value is on disk. If we detect the
current - on_disk is > X/2, we call commit_inode_metadata() and we are
done. It is not even *that* expensive because usually filesystems optimize
away unnecessary IO when the inode didn't change since last time it got
synced.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists