[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <01ae01d8cf5d$023474d0$069d5e70$@mindspring.com>
Date: Fri, 23 Sep 2022 07:58:55 -0700
From: "Frank Filz" <ffilzlnx@...dspring.com>
To: "'Jeff Layton'" <jlayton@...nel.org>,
"'Trond Myklebust'" <trondmy@...merspace.com>, <jack@...e.cz>
Cc: <zohar@...ux.ibm.com>, <djwong@...nel.org>, <brauner@...nel.org>,
<linux-xfs@...r.kernel.org>, <bfields@...ldses.org>,
<linux-api@...r.kernel.org>, <neilb@...e.de>,
<david@...morbit.com>, <fweimer@...hat.com>,
<linux-kernel@...r.kernel.org>, <chuck.lever@...cle.com>,
<linux-man@...r.kernel.org>, <linux-nfs@...r.kernel.org>,
<linux-ext4@...r.kernel.org>, <tytso@....edu>,
<viro@...iv.linux.org.uk>, <xiubli@...hat.com>,
<linux-fsdevel@...r.kernel.org>, <adilger.kernel@...ger.ca>,
<lczerner@...hat.com>, <ceph-devel@...r.kernel.org>,
<linux-btrfs@...r.kernel.org>
Subject: RE: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field
> -----Original Message-----
> From: Jeff Layton [mailto:jlayton@...nel.org]
> Sent: Friday, September 23, 2022 6:50 AM
> To: Trond Myklebust <trondmy@...merspace.com>; jack@...e.cz
> Cc: zohar@...ux.ibm.com; djwong@...nel.org; brauner@...nel.org; linux-
> xfs@...r.kernel.org; bfields@...ldses.org; linux-api@...r.kernel.org;
> neilb@...e.de; david@...morbit.com; fweimer@...hat.com; linux-
> kernel@...r.kernel.org; chuck.lever@...cle.com; linux-man@...r.kernel.org;
> linux-nfs@...r.kernel.org; linux-ext4@...r.kernel.org; tytso@....edu;
> viro@...iv.linux.org.uk; xiubli@...hat.com; linux-fsdevel@...r.kernel.org;
> adilger.kernel@...ger.ca; lczerner@...hat.com; ceph-devel@...r.kernel.org;
> linux-btrfs@...r.kernel.org
> Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
> STATX_INO_VERSION field
>
> On Fri, 2022-09-23 at 13:44 +0000, Trond Myklebust wrote:
> > On Fri, 2022-09-23 at 11:56 +0200, Jan Kara wrote:
> > > On Thu 22-09-22 16:18:02, Jeff Layton wrote:
> > > > On Thu, 2022-09-22 at 06:18 -0400, Jeff Layton wrote:
> > > > > On Thu, 2022-09-22 at 07:41 +1000, Dave Chinner wrote:
> > > > > > e.g. The NFS server can track the i_version values when the
> > > > > > NFSD syncs/commits a given inode. The nfsd can sample
> > > > > > i_version it when calls ->commit_metadata or flushed data on
> > > > > > the inode, and then when it peeks at i_version when gathering
> > > > > > post-op attrs (or any other getattr op) it can decide that
> > > > > > there is too much in-memory change (e.g. 10,000 counts since
> > > > > > last sync) and sync the inode.
> > > > > >
> > > > > > i.e. the NFS server can trivially cap the maximum number of
> > > > > > uncommitted NFS change attr bumps it allows to build up in
> > > > > > memory.
> > > > > > At that point, the NFS server has a bound "maximum write count"
> > > > > > that
> > > > > > can be used in conjunction with the xattr based crash counter
> > > > > > to determine how the change_attr is bumped by the crash
> > > > > > counter.
> > > > >
> > > > > Well, not "trivially". This is the bit where we have to grow
> > > > > struct inode (or the fs-specific inode), as we'll need to know
> > > > > what the latest on-disk value is for the inode.
> > > > >
> > > > > I'm leaning toward doing this on the query side. Basically, when
> > > > > nfsd goes to query the i_version, it'll check the delta between
> > > > > the current version and the latest one on disk. If it's bigger
> > > > > than X then we'd just return NFS4ERR_DELAY to the client.
> > > > >
> > > > > If the delta is >X/2, maybe it can kick off a workqueue job or
> > > > > something that calls write_inode with WB_SYNC_ALL to try to get
> > > > > the thing onto the platter ASAP.
> > > >
> > > > Still looking at this bit too. Probably we can just kick off a
> > > > WB_SYNC_NONE filemap_fdatawrite at that point and hope for the
> > > > best?
> > >
> > > "Hope" is not a great assurance regarding data integrity ;) Anyway,
> > > it depends on how you imagine the "i_version on disk" is going to be
> > > maintained. It could be maintained by NFSD inside
> > > commit_inode_metadata() -
> > > fetch current i_version value before asking filesystem for the sync
> > > and by the time commit_metadata() returns we know that value is on
> > > disk. If we detect the current - on_disk is > X/2, we call
> > > commit_inode_metadata() and we are done. It is not even *that*
> > > expensive because usually filesystems optimize away unnecessary IO
> > > when the inode didn't change since last time it got synced.
> > >
> >
> > Note that these approaches requiring 3rd party help in order to track
> > i_version integrity across filesystem crashes all make the idea of
> > adding i_version to statx() a no-go.
> >
> > It is one thing for knfsd to add specialised machinery for integrity
> > checking, but if all applications need to do so, then they are highly
> > unlikely to want to adopt this attribute.
> >
> >
>
> Absolutely. That is the downside of this approach, but the priority here
has
> always been to improve nfsd. If we don't get the ability to present this
info via
> statx, then so be it. Later on, I suppose we can move that handling into
the
> kernel in some fashion if we decide it's worthwhile.
>
> That said, not having this in statx makes it more difficult to test
i_version
> behavior. Maybe we can add a generic ioctl for that in the interim?
Having i_version in statx would be really nice for nfs-ganesha. I would
consider doing the extra integrity stuff or we may in some cases be able to
rely on the filesystem, but in any case, i_version would be an improvement
over using ctime or mtime for a change attribute.
Frank
Powered by blists - more mailing lists