lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d771ea4d44f3c9da8470d0aa9d58ee1d96f5fb30.camel@kernel.org>
Date: Thu, 19 Sep 2024 18:50:59 +0200
From: Jeff Layton <jlayton@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>, John Stultz <jstultz@...gle.com>, 
 Stephen Boyd <sboyd@...nel.org>, Alexander Viro <viro@...iv.linux.org.uk>,
 Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Steven
 Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Jonathan Corbet
 <corbet@....net>, Chandan Babu R <chandan.babu@...cle.com>, "Darrick J.
 Wong" <djwong@...nel.org>, Theodore Ts'o <tytso@....edu>, Andreas Dilger
 <adilger.kernel@...ger.ca>, Chris Mason <clm@...com>, Josef Bacik
 <josef@...icpanda.com>, David Sterba <dsterba@...e.com>,  Hugh Dickins
 <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, Chuck Lever
 <chuck.lever@...cle.com>, Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Cc: Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org, 
 linux-fsdevel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
 linux-doc@...r.kernel.org, linux-xfs@...r.kernel.org,
 linux-ext4@...r.kernel.org,  linux-btrfs@...r.kernel.org,
 linux-nfs@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor
 handling into timekeeper

On Mon, 2024-09-16 at 12:12 +0200, Thomas Gleixner wrote:
> On Sat, Sep 14 2024 at 13:07, Jeff Layton wrote:
> > For multigrain timestamps, we must keep track of the latest timestamp
> 
> What is a multgrain timestamp? Can you please describe the concept
> behind it? I'm not going to chase random documentation or whatever
> because change logs have to self contained.
> 
> And again 'we' do nothing. Describe the problem in technical terms and
> do not impersonate code.
> 

Hi Thomas!

Sorry for the delay in responding. I'll try to summarize below, but
I'll also note that patch #7 in the v8 series adds a file to
Documentation/ that explains this in a bit more depth:

Currently the kernel always stamps files (mtime, ctime, etc.) using the
coarse-grained clock. This is usually a good thing, since it reduces
the number of metadata updates, but means that you can't reliably use
file timestamps to detect whether there have been changes to the file
since it was last checked. This is particularly a problem for NFSv3
clients, which use timestamps to know when to invalidate their
pagecache for an inode [1].

The idea is to allow the kernel to use fine-grained timestamps (mtime
and ctime) on files when they are under direct observation. When a task
does a ->getattr against an inode for STATX_MTIME or STATX_CTIME, a
flag is set in the inode that tells the kernel to use the fine-grained
clock for the timestamp update iff the current coarse-grained clock
value would not cause a change to the mtime/ctime.

This works, but there is a problem:

It's possible for one inode to get a fine-grained timestamp, and then
another to get a coarse-grained timestamp. If this happens within a
single coarse-grained timer tick, then the files may appear to have
been modified in reverse order, which breaks POSIX guarantees (and
obscure programs like "make").

The fix for this is to establish a floor value for the coarse-grained
clock. When stamping a file with a fine-grained timestamp, we update
the floor value with the current monotonic time (using cmpxchg). Then
later, when a coarse-grained timestamp is requested, check whether the
floor is later than the current coarse-grained time. If it is, then the
kernel will return the floor value (converted to realtime) instead of
the current coarse-grained clock. That allows us to maintain the
ordering guarantees.

My original implementation of this tracked the floor value in
fs/inode.c (also using cmpxchg), but that caused a performance
regression, mostly due to multiple calls into the timekeeper functions
with seqcount loops. By adding the floor to the timekeeper we can get
that back down to 1 seqcount loop.

Let me know if you have more questions about this, or suggestions about
how to do this better. The timekeeping code is not my area of expertise
(obviously) so I'm open to doing this a better way if there is one.

Thanks for the review so far!

[1]: NFSv4 mandates an opaque change attribute (usually using
inode->i_version), but only some filesystems have a proper
implementation of it (XFS being the notable exception). For the others,
we end up using the ctime to generate a change attribute, which means
that NFSv4 has the same problem on those filesystems. i_version also
doesn't help NFSv3 clients and servers.
-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ