lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230920154527.pkwot4nu2nzrnamd@quack3>
Date:   Wed, 20 Sep 2023 17:45:27 +0200
From:   Jan Kara <jack@...e.cz>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Jan Kara <jack@...e.cz>, Christian Brauner <brauner@...nel.org>,
        Bruno Haible <bruno@...sp.org>,
        Xi Ruoyao <xry111@...uxfromscratch.org>, bug-gnulib@....org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Eric Van Hensbergen <ericvh@...nel.org>,
        Latchesar Ionkov <lucho@...kov.net>,
        Dominique Martinet <asmadeus@...ewreck.org>,
        Christian Schoenebeck <linux_oss@...debyte.com>,
        David Howells <dhowells@...hat.com>,
        Marc Dionne <marc.dionne@...istor.com>,
        Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
        David Sterba <dsterba@...e.com>, Xiubo Li <xiubli@...hat.com>,
        Ilya Dryomov <idryomov@...il.com>,
        Jan Harkes <jaharkes@...cmu.edu>, coda@...cmu.edu,
        Tyler Hicks <code@...icks.com>, Gao Xiang <xiang@...nel.org>,
        Chao Yu <chao@...nel.org>, Yue Hu <huyue2@...lpad.com>,
        Jeffle Xu <jefflexu@...ux.alibaba.com>,
        Namjae Jeon <linkinjeon@...nel.org>,
        Sungjong Seo <sj1557.seo@...sung.com>,
        Jan Kara <jack@...e.com>, Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
        Miklos Szeredi <miklos@...redi.hu>,
        Bo b Peterson <rpeterso@...hat.com>,
        Andreas Gruenbacher <agruenba@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>,
        Trond Myklebust <trond.myklebust@...merspace.com>,
        Anna Schumaker <anna@...nel.org>,
        Konstantin Komarov <almaz.alexandrovich@...agon-software.com>,
        Mark Fasheh <mark@...heh.com>,
        Joel Becker <jlbec@...lplan.org>,
        Joseph Qi <joseph.qi@...ux.alibaba.com>,
        Mike Marshall <hubcap@...ibond.com>,
        Martin Brandenburg <martin@...ibond.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Steve French <sfrench@...ba.org>,
        Paulo Alcantara <pc@...guebit.com>,
        Ronnie Sahlberg <ronniesahlberg@...il.com>,
        Shyam Prasad N <sprasad@...rosoft.com>,
        Tom Talpey <tom@...pey.com>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Richard Weinberger <richard@....at>,
        Hans de Goede <hdegoede@...hat.com>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Amir Goldstein <l@...il.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Benjamin Coddington <bcodding@...hat.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        v9fs@...ts.linux.dev, linux-afs@...ts.infradead.org,
        linux-btrfs@...r.kernel.org, ceph-devel@...r.kernel.org,
        codalist@...a.cs.cmu.edu, ecryptfs@...r.kernel.org,
        linux-erofs@...ts.ozlabs.org, linux-ext4@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, cluster-devel@...hat.com,
        linux-nfs@...r.kernel.org, ntfs3@...ts.linux.dev,
        ocfs2-devel@...ts.linux.dev, devel@...ts.orangefs.org,
        linux-cifs@...r.kernel.org, samba-technical@...ts.samba.org,
        linux-mtd@...ts.infradead.org, linux-mm@...ck.org,
        linux-unionfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH v7 12/13] ext4: switch to multigrain timestamps

On Wed 20-09-23 10:12:03, Jeff Layton wrote:
> On Wed, 2023-09-20 at 14:48 +0200, Jan Kara wrote:
> > On Wed 20-09-23 06:35:18, Jeff Layton wrote:
> > > On Wed, 2023-09-20 at 12:17 +0200, Jan Kara wrote:
> > > > If I were a sysadmin, I'd rather opt for something like
> > > > finegrained timestamps + lazytime (if I needed the finegrained timestamps
> > > > functionality). That should avoid the IO overhead of finegrained timestamps
> > > > as well and I'd know I can have problems with timestamps only after a
> > > > system crash.
> > > 
> > > > I've just got another idea how we could solve the problem: Couldn't we
> > > > always just report coarsegrained timestamp to userspace and provide access
> > > > to finegrained value only to NFS which should know what it's doing?
> > > > 
> > > 
> > > I think that'd be hard. First of all, where would we store the second
> > > timestamp? We can't just truncate the fine-grained ones to come up with
> > > a coarse-grained one. It might also be confusing having nfsd and local
> > > filesystems present different attributes.
> > 
> > So what I had in mind (and I definitely miss all the NFS intricacies so the
> > idea may be bogus) was that inode->i_ctime would be maintained exactly as
> > is now. There will be new (kernel internal at least for now) STATX flag
> > STATX_MULTIGRAIN_TS. fill_mg_cmtime() will return timestamp truncated to
> > sb->s_time_gran unless STATX_MULTIGRAIN_TS is set. Hence unless you set
> > STATX_MULTIGRAIN_TS, there is no difference in the returned timestamps
> > compared to the state before multigrain timestamps were introduced. With
> > STATX_MULTIGRAIN_TS we return full precision timestamp as stored in the
> > inode. Then NFS in fh_fill_pre_attrs() and fh_fill_post_attrs() needs to
> > make sure STATX_MULTIGRAIN_TS is set when calling vfs_getattr() to get
> > multigrain time.
> 
> > I agree nfsd may now be presenting slightly different timestamps than user
> > is able to see with stat(2) directly on the filesystem. But is that a
> > problem? Essentially it is a similar solution as the mgtime mount option
> > but now sysadmin doesn't have to decide on filesystem mount how to report
> > timestamps but the stat caller knowingly opts into possibly inconsistent
> > (among files) but high precision timestamps. And in the particular NFS
> > usecase where stat is called all the time anyway, timestamps will likely
> > even be consistent among files.
> > 
> 
> I like this idea...
> 
> Would we also need to raise sb->s_time_gran to something corresponding
> to HZ on these filesystems?

I was actually confused a bit about how timestamp_truncate() works. The
jiffie granularity is just direct consequence of current_time() using
ktime_get_coarse_real_ts64() and not of timestamp_truncate().
sb->s_time_gran seems to be more about the on-disk format so it doesn't
seem like a great idea to touch it. So probably we can just truncate
timestamps in generic_fillattr() to HZ granularity unconditionally.

> If we truncate the timestamps at a granularity corresponding to HZ before
> presenting them via statx and the like then that should work around the
> problem with programs that compare timestamps between inodes.

Exactly.

> With NFSv4, when a filesystem doesn't report a STATX_CHANGE_COOKIE, nfsd
> will fake one up using the ctime. It's fine for that to use a full fine-
> grained timestamp since we don't expect to be able to compare that value
> with one of a different inode.

Yes.

> I think we'd want nfsd to present the mtime/ctime values as truncated,
> just like we would with a local fs. We could hit the same problem of an
> earlier-looking timestamp with NFS if we try to present the actual fine-
> grained values to the clients. IOW, I'm convinced that we need to avoid
> this behavior in most situations.

I wasn't sure if there's a way to do this within NFS - i.e., if the value
communicated via NFSv3 protocol (I know v4 has a special change cookie
field for it) that gets used for detecting need to revalidate file contents
isn't the one presented to client's userspace as ctime. If there's a way to
do this then great, I'm all for presenting truncated timestamps even for
NFS.

> If we do this, then we technically don't need the mount option either.

Yes, that was my hope.

> We could still add it though, and have it govern whether fill_mg_cmtime
> truncates the timestamps before storing them in the kstat.

Well, if we decide these timestamps are useful for userspace as well, I'd
rather make that a userspace visible STATX flag than a mount option. So
applications aware of the pitfalls can get high precision timestamps
without possibly breaking unaware applications.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ