[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a939c44746e0ab98fb1274ffa33b82dcbc789d1.camel@hammerspace.com>
Date: Sat, 27 Aug 2022 17:06:42 +0000
From: Trond Myklebust <trondmy@...merspace.com>
To: "djwong@...nel.org" <djwong@...nel.org>,
"jlayton@...nel.org" <jlayton@...nel.org>
CC: "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
"xiubli@...hat.com" <xiubli@...hat.com>,
"brauner@...nel.org" <brauner@...nel.org>,
"dwysocha@...hat.com" <dwysocha@...hat.com>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
"david@...morbit.com" <david@...morbit.com>,
"neilb@...e.de" <neilb@...e.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"chuck.lever@...cle.com" <chuck.lever@...cle.com>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"tytso@....edu" <tytso@....edu>,
"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"jack@...e.cz" <jack@...e.cz>,
"amir73il@...il.com" <amir73il@...il.com>,
"linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"lczerner@...hat.com" <lczerner@...hat.com>,
"adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update
in xfs_vn_update_time
On Sat, 2022-08-27 at 12:10 -0400, Jeff Layton wrote:
> On Sat, 2022-08-27 at 16:03 +0000, Trond Myklebust wrote:
> > On Sat, 2022-08-27 at 08:46 -0700, Darrick J. Wong wrote:
> > > On Sat, Aug 27, 2022 at 09:14:30AM -0400, Jeff Layton wrote:
> > > > On Sat, 2022-08-27 at 11:01 +0300, Amir Goldstein wrote:
> > > > > On Sat, Aug 27, 2022 at 10:26 AM Amir Goldstein
> > > > > <amir73il@...il.com> wrote:
> > > > > >
> > > > > > On Sat, Aug 27, 2022 at 12:49 AM Jeff Layton
> > > > > > <jlayton@...nel.org> wrote:
> > > > > > >
> > > > > > > xfs will update the i_version when updating only the
> > > > > > > atime
> > > > > > > value, which
> > > > > > > is not desirable for any of the current consumers of
> > > > > > > i_version. Doing so
> > > > > > > leads to unnecessary cache invalidations on NFS and extra
> > > > > > > measurement
> > > > > > > activity in IMA.
> > > > > > >
> > > > > > > Add a new XFS_ILOG_NOIVER flag, and use that to indicate
> > > > > > > that
> > > > > > > the
> > > > > > > transaction should not update the i_version. Set that
> > > > > > > value
> > > > > > > in
> > > > > > > xfs_vn_update_time if we're only updating the atime.
> > > > > > >
> > > > > > > Cc: Dave Chinner <david@...morbit.com>
> > > > > > > Cc: NeilBrown <neilb@...e.de>
> > > > > > > Cc: Trond Myklebust <trondmy@...merspace.com>
> > > > > > > Cc: David Wysochanski <dwysocha@...hat.com>
> > > > > > > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > > > > > > ---
> > > > > > > fs/xfs/libxfs/xfs_log_format.h | 2 +-
> > > > > > > fs/xfs/libxfs/xfs_trans_inode.c | 2 +-
> > > > > > > fs/xfs/xfs_iops.c | 11 +++++++++--
> > > > > > > 3 files changed, 11 insertions(+), 4 deletions(-)
> > > > > > >
> > > > > > > Dave has NACK'ed this patch, but I'm sending it as a way
> > > > > > > to
> > > > > > > illustrate
> > > > > > > the problem. I still think this approach should at least
> > > > > > > fix
> > > > > > > the worst
> > > > > > > problems with atime updates being counted. We can look to
> > > > > > > carve out
> > > > > > > other "spurious" i_version updates as we identify them.
> > > > > > >
> > > > > >
> > > > > > AFAIK, "spurious" is only inode blocks map changes due to
> > > > > > writeback
> > > > > > of dirty pages. Anybody know about other cases?
> > > > > >
> > > > > > Regarding inode blocks map changes, first of all, I don't
> > > > > > think
> > > > > > that there is
> > > > > > any practical loss from invalidating NFS client cache on
> > > > > > dirty
> > > > > > data writeback,
> > > > > > because NFS server should be serving cold data most of the
> > > > > > time.
> > > > > > If there are a few unneeded cache invalidations they would
> > > > > > only
> > > > > > be temporary.
> > > > > >
> > > > >
> > > > > Unless there is an issue with a writer NFS client that
> > > > > invalidates its
> > > > > own attribute
> > > > > caches on server data writeback?
> > > > >
> > > >
> > > > The client just looks at the file attributes (of which
> > > > i_version is
> > > > but
> > > > one), and if certain attributes have changed (mtime, ctime,
> > > > i_version,
> > > > etc...) then it invalidates its cache.
> > > >
> > > > In the case of blocks map changes, could that mean a difference
> > > > in
> > > > the
> > > > observable sparse regions of the file? If so, then a READ_PLUS
> > > > before
> > > > the change and a READ_PLUS after could give different results.
> > > > Since
> > > > that difference is observable by the client, I'd think we'd
> > > > want to
> > > > bump
> > > > i_version for that anyway.
> > >
> > > How /is/ READ_PLUS supposed to detect sparse regions, anyway? I
> > > know
> > > that's been the subject of recent debate. At least as far as XFS
> > > is
> > > concerned, a file range can go from hole -> delayed allocation
> > > reservation -> unwritten extent -> (actual writeback) -> written
> > > extent.
> > > The dance became rather more complex when we added COW. If any
> > > of
> > > that
> > > will make a difference for READ_PLUS, then yes, I think you'd
> > > want
> > > file
> > > writeback activities to bump iversion to cause client
> > > invalidations,
> > > like (I think) Dave said.
> > >
> > > The fs/iomap/ implementation of SEEK_DATA/SEEK_HOLE reports data
> > > for
> > > written and delalloc extents; and an unwritten extent will report
> > > data
> > > for any pagecache it finds.
> > >
> >
> > READ_PLUS should never return anything different than a read()
> > system
> > call would return for any given area. The way it reports sparse
> > regions
> > vs. data regions is purely an RPC formatting convenience.
> >
> > The only point to note about NFS READ and READ_PLUS is that because
> > the
> > client is forced to send multiple RPC calls if the user is trying
> > to
> > read a region that is larger than the 'rsize' value, it is possible
> > that these READ/READ_PLUS calls may be processed out of order, and
> > so
> > the result may end up looking different than if you had executed a
> > read() call for the full region directly on the server.
> > However each individual READ / READ_PLUS reply should look as if
> > the
> > user had called read() on that rsize-sized section of the file.
> > > >
>
> Yeah, thinking about it some more, simply changing the block
> allocation
> is not something that should affect the ctime, so we probably don't
> want
> to bump i_version on it. It's an implicit change, IOW, not an
> explicit
> one.
As you say, it is unfortunate that XFS does this, and it is unfortunate
that it then changes the blocks allocated attribute post-fsync(), since
all that does cause confusion for certain applications.
However I agree 100% that this is an implicit change that is driven by
the filesystem and not the user application. Hence it is not an action
that needs to be recorded with a change attribute bump.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com
Powered by blists - more mailing lists