linux-kernel - Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220923095653.5c63i2jgv52j3zqp@quack3>
Date:   Fri, 23 Sep 2022 11:56:53 +0200
From:   Jan Kara <jack@...e.cz>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Dave Chinner <david@...morbit.com>, Theodore Ts'o <tytso@....edu>,
        NeilBrown <neilb@...e.de>,
        Trond Myklebust <trondmy@...merspace.com>,
        "bfields@...ldses.org" <bfields@...ldses.org>,
        "zohar@...ux.ibm.com" <zohar@...ux.ibm.com>,
        "djwong@...nel.org" <djwong@...nel.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
        "fweimer@...hat.com" <fweimer@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "chuck.lever@...cle.com" <chuck.lever@...cle.com>,
        "linux-man@...r.kernel.org" <linux-man@...r.kernel.org>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "jack@...e.cz" <jack@...e.cz>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "xiubli@...hat.com" <xiubli@...hat.com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "lczerner@...hat.com" <lczerner@...hat.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
 STATX_INO_VERSION field

On Thu 22-09-22 16:18:02, Jeff Layton wrote:
> On Thu, 2022-09-22 at 06:18 -0400, Jeff Layton wrote:
> > On Thu, 2022-09-22 at 07:41 +1000, Dave Chinner wrote:
> > > e.g. The NFS server can track the i_version values when the NFSD
> > > syncs/commits a given inode. The nfsd can sample i_version it when
> > > calls ->commit_metadata or flushed data on the inode, and then when
> > > it peeks at i_version when gathering post-op attrs (or any other
> > > getattr op) it can decide that there is too much in-memory change
> > > (e.g. 10,000 counts since last sync) and sync the inode.
> > > 
> > > i.e. the NFS server can trivially cap the maximum number of
> > > uncommitted NFS change attr bumps it allows to build up in memory.
> > > At that point, the NFS server has a bound "maximum write count" that
> > > can be used in conjunction with the xattr based crash counter to
> > > determine how the change_attr is bumped by the crash counter.
> > 
> > Well, not "trivially". This is the bit where we have to grow struct
> > inode (or the fs-specific inode), as we'll need to know what the latest
> > on-disk value is for the inode.
> > 
> > I'm leaning toward doing this on the query side. Basically, when nfsd
> > goes to query the i_version, it'll check the delta between the current
> > version and the latest one on disk. If it's bigger than X then we'd just
> > return NFS4ERR_DELAY to the client.
> > 
> > If the delta is >X/2, maybe it can kick off a workqueue job or something
> > that calls write_inode with WB_SYNC_ALL to try to get the thing onto the
> > platter ASAP.
> 
> Still looking at this bit too. Probably we can just kick off a
> WB_SYNC_NONE filemap_fdatawrite at that point and hope for the best?

"Hope" is not a great assurance regarding data integrity ;) Anyway, it
depends on how you imagine the "i_version on disk" is going to be
maintained. It could be maintained by NFSD inside commit_inode_metadata() -
fetch current i_version value before asking filesystem for the sync and by the
time commit_metadata() returns we know that value is on disk. If we detect the
current - on_disk is > X/2, we call commit_inode_metadata() and we are
done. It is not even *that* expensive because usually filesystems optimize
away unnecessary IO when the inode didn't change since last time it got
synced.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR