linux-ext4 - Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update in xfs_vn_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220830000851.GV3600936@dread.disaster.area>
Date:   Tue, 30 Aug 2022 10:08:51 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Amir Goldstein <amir73il@...il.com>,
        Trond Myklebust <trondmy@...merspace.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "neilb@...e.de" <neilb@...e.de>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "dwysocha@...hat.com" <dwysocha@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [PATCH v3 4/7] xfs: don't bump the i_version on an atime update
 in xfs_vn_update_time

On Mon, Aug 29, 2022 at 06:33:48AM -0400, Jeff Layton wrote:
> On Mon, 2022-08-29 at 15:48 +1000, Dave Chinner wrote:
> > > 
> > > The race window ought to be relatively small, and this wouldn't result
> > > in incorrect behavior that you'd notice (other than loss of
> > > performance), but it's not ideal. We're doing more on-the-wire reads
> > > than are necessary in this case.
> > > 
> > > It would be nice to have it not do that. If we end up taking this patch
> > > to make it elide the i_version bumps on atime updates, we may be able to
> > > set the the NOIVER flag in other cases as well, and avoid some of these
> > > extra bumps.
> > 
> > 
> > <sigh>
> > 
> > Please don't make me repeat myself for the third time.
> > 
> > Once we have decided on a solid, unchanging definition for the
> > *statx user API variable*, we'll implement a new on-disk field that
> > provides this information.  We will document it in the on-disk
> > specification as "this is how di_iversion behaves" so that it is
> > clear to everyone parsing the on-disk format or writing their own
> > XFS driver how to implement it and when to expect it to
> > change.
> > 
> > Then we can add a filesystem and inode feature flags that say "inode
> > has new iversion" and we use that to populate the kernel iversion
> > instead of di_changecount. We keep di_changecount exactly the way it
> > is now for the applications and use cases we already have for that
> > specific behaviour. If the kernel and/or filesystem don't support
> > the new di_iversion field, then we'll use di_changecount as it
> > currently exists for the kernel iversion code.
> > 
> 
> Aside from NFS and IMA, what applications are dependent on the current
> definition and how do they rely on i_version today?

I've answered this multiple times already: the di_changecount
behaviour is defined in the on-disk specification and hence we
*cannot change the behaviour* without changing the on-disk format
specification.

Apart from the forensics aspect of the change counter (which nobody
but us XFS developers seem to understand just how damn important
this is), there are *many* third party applications that parse the
XFS on-disk format directly. This:

https://codesearch.debian.net/search?q=XFS_SB_VERSION_DIRV2&literal=1

Shows grub2, libparted, syslinux, partclone and fsarchiver as
knowing about XFS on-disk superblock flags that tell them what
format the directory structure is in. That alone is enough to
indicate they parse on-disk inodes directly, and hence may expect
di_changecount to have specific meaning and use it to detect
unexpected changes to files/directories they care about.

If I go looking for XFS_SB_MAGIC, I find things like libblkid,
klibc, qemu, Xen, testdisk, gpart, and virtualbox all parse the
on-disk superblocks directly from the block device, too. They also
rely directly on XFS developers ensuring there are no silent
incomaptible changes to the on disk format.

I also know of many other utilities that people and companies have
written that parse the on disk format directly from userspace. The
functions these perform include low level storage management tools,
copying and managing disk images (e.g. offline configuration for
cluster deployments), data recovery tools that scrape all the data
out of broken filesystems, etc.

These applications are reliant on the guarantee we provide that the
on-disk format will not silently change and that behaviour/structure
can always easily be discovered by feature flags in the superblock
and/or inodes.

IOWs, just because there aren't obvious "traditional" application on
top of the kernel filesystem that consumes the in-memory kernel
iversion field, it does not mean that the defined behaviour of the
on-disk di_changecount field is not used or relied on by other tools
that work directly on the on-disk format.

You might be right that NFS doesn't care about this, but the point
remains that NFS does not control the XFS on-disk format, nor does
the fact that what NFS wants from the change attribute has changed
over time override the fact that maintaining XFS on-disk format
compatibility is the responsibility of XFS developers. We're willing
to change the on-disk format to support whatever the new definition
of the statx change attribute ends up being, and that should be the
end of the discussion.

> > Keep in mind that we've been doing dynamic inode format updates in
> > XFS for a couple of decades - users don't even have to be aware that
> > they need to perform format upgrades because often they just happen
> > whenever an inode is accessed. IOWs, just because we have to change
> > the on-disk format to support this new iversion definition, it
> > doesn't mean users have to reformat filesystems before the new
> > feature can be used.
> > 
> > Hence, over time, as distros update kernels, the XFS iversion
> > behaviour will change automagically as we update inodes in existing
> > filesystems as they are accessed to add and then use the new
> > di_iversion field for the VFS change attribute field instead of the
> > di_changecount field...
> > 
> 
> If you want to create a whole new on-disk field for this, then that's
> your prerogative, but before you do that, I'd like to better understand
> why and how the constraints on this field changed.
> 
> The original log message from the commit that added a change counter
> (below) stated that you were adding it for network filesystems like NFS.
> When did this change and why?

It never changed. I'll repeat what I've already explained twice
before:

https://lore.kernel.org/linux-xfs/20220818030048.GE3600936@dread.disaster.area/
https://lore.kernel.org/linux-xfs/20220818033731.GF3600936@dread.disaster.area/

tl; dr: NFS requirements were just one of *many* we had at the time
for an atomic persistent change counter.

The fact is that NFS users are just going to have to put up with
random cache invalidations on XFS for a while longer. Nobody noticed
this and/or cared about this enough to raise it as an issue for the
past decade, so waiting another few months for upstream XFS to
change to a different on-disk format for the NFS/statx change
attribute isn't a big deal.

-Dave.
-- 
Dave Chinner
david@...morbit.com