lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220822233231.GJ3600936@dread.disaster.area>
Date:   Tue, 23 Aug 2022 09:32:31 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Mimi Zohar <zohar@...ux.ibm.com>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-integrity@...r.kernel.org,
        linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-ext4@...r.kernel.org, NeilBrown <neilb@...e.de>,
        Trond Myklebust <trondmy@...merspace.com>
Subject: Re: [PATCH] iversion: update comments with info about atime updates

On Mon, Aug 22, 2022 at 02:22:20PM -0400, Jeff Layton wrote:
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> index 3bfebde5a1a6..524abd372100 100644
> --- a/include/linux/iversion.h
> +++ b/include/linux/iversion.h
> @@ -9,8 +9,8 @@
>   * ---------------------------
>   * The change attribute (i_version) is mandated by NFSv4 and is mostly for
>   * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> - * appear different to observers if there was a change to the inode's data or
> - * metadata since it was last queried.
> + * appear different to observers if there was an explicit change to the inode's
> + * data or metadata since it was last queried.
>   *
>   * Observers see the i_version as a 64-bit number that never decreases. If it
>   * remains the same since it was last checked, then nothing has changed in the
> @@ -18,6 +18,13 @@
>   * anything about the nature or magnitude of the changes from the value, only
>   * that the inode has changed in some fashion.
>   *
> + * Note that atime updates due to reads or similar activity do not represent

What does "or similar activity" mean?

This whole atime vs iversion issue is arising because "or any
metadata change" was interpretted literally by filesystems to mean
"any metadata change" without caveats or exclusions. Now you're both
changing that definition and making things *worse* by adding
explicit wiggle-room for future changes in behaviour to persistent
change counter behaviour.

iversion is going to be exposed to userspace, so we *can't change
the definition in future* because behaviour is bound by "changes may
break userspace apps" constraints. IOWs, we cannot justify changes
in behaviour with "but there are only in-kernel users" like is being
done for this change.

It's only a matter of time before someone is going to complain about
the fact that filesystems bump the change counter for internal
metadata modifications as they make changes to the persistent state
of data and metadata. These existing behaviours will almost
certainly causes visible NFS quirks due to unexpected
iversion bumps.

In case you didn't realise, XFS can bump iversion 500+ times for a
single 1MB write() on a 4kB block size filesytem, and only one of
them is initial write() system call that copies the data into the
page cache. The other 500+ are all the extent allocation and
manipulation transactions that we might run when persisting the data
to disk tens of seconds later. This is how iversion on XFS has
behaved for the past decade.

Right now, both ext4 and XFS conform to the exact definition that is
in this file. Trond has outlines that NFS wants iversion to behave
exactly like c/mtime changes, but that is fundamentally different to
how both ext4 and XFS have implemented the persistent change
counters.

IOWs, if we are going to start randomly excluding specific metadata
from the iversion API, then we need a full definition of exactly
what operations are supposed to trigger an iversion bump.
API-design-by-application-deficiency-whack-a-mole is not appropriate
for persistent structures, nor is it appropriate for information
that is going to be exposed to userspace via the kernel ABI.

Therefore, before we change behaviour in the filesystems or expose
it to userspace, we need to define *exactly* what changes are
covered by iversion. Once we have that definition, then we can
modify the filesytems appropriately to follow the required
definition and only change the persistent iversion counter when the
definition says we should change it.

While Trond's description is a useful definition from the
application perspective - and I'm not opposed to making it behave
like that (i.e. iversion only bumped with c/mtime changes) - it
requires a fundamentally different implementation in filesystems.

We must no longer capture *every metadata change* as we are
currently doing as per the current specification, and instead only
capture *application metadata changes* only.  i.e.  we are going to
need an on-disk format change of some kind because the two
behaviours are not compatible.

Further, if iversion is going to be extended to userspace, we most
definitely need to decouple it from our internal "change on every
metadata change" behaviour. We change how we persist metadata to
disk over time and if we don't abstract that away from the
persistent change counter we expose to userspace then this will lead
to random user visible changes in iversion behaviour in future.

Any way I look at it, we're at the point where filesystems now need
a real, fixed definition of what changes require an iversion bump
and which don't. The other option is that move the iversion bumping
completely up into the VFS where it happens consistently for every
filesystem whether they persist it or not. We also need a set of
test code for fstests that check that iversion performs as expected
and to the specification that the VFS defines for statx.

Either way we chose, one of these options are the only way that we
will end up with a consistent implementation of a change counter
across all the filesystems. And, quite frankly, that's exactly is
needed if we are going to present this information to userspace
forever more.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ