linux-ext4 - Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ef49a42e95a5cb1a0ce77766c13e9f227cb446e.camel@hammerspace.com>
Date:   Wed, 1 Nov 2023 22:45:29 +0000
From:   Trond Myklebust <trondmy@...merspace.com>
To:     "torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>
CC:     "hughd@...gle.com" <hughd@...gle.com>,
        "josef@...icpanda.com" <josef@...icpanda.com>,
        "jstultz@...gle.com" <jstultz@...gle.com>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "djwong@...nel.org" <djwong@...nel.org>, "clm@...com" <clm@...com>,
        "chandan.babu@...cle.com" <chandan.babu@...cle.com>,
        "david@...morbit.com" <david@...morbit.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "dsterba@...e.com" <dsterba@...e.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "jlayton@...nel.org" <jlayton@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "tytso@....edu" <tytso@....edu>,
        "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
        "jack@...e.cz" <jack@...e.cz>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
        "amir73il@...il.com" <amir73il@...il.com>,
        "linux-btrfs@...r.kernel.org" <linux-btrfs@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "adilger.kernel@...ger.ca" <adilger.kernel@...ger.ca>,
        "kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>,
        "sboyd@...nel.org" <sboyd@...nel.org>,
        "dhowells@...hat.com" <dhowells@...hat.com>,
        "jack@...e.de" <jack@...e.de>
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain
 timestamp handing

On Wed, 2023-11-01 at 12:23 -1000, Linus Torvalds wrote:
> On Wed, Nov 1, 2023, 11:35 Trond Myklebust <trondmy@...merspace.com>
> wrote:
> > 
> > My client writes to the file and immediately reads the ctime. A 3rd
> > party client then writes immediately after my ctime read.
> > A reboot occurs (maybe minutes later), then I re-read the ctime,
> > and
> > get the same value as before the 3rd party write.
> > 
> > Yes, most of the time that is better than the naked ctime, but not
> > across a reboot.
> 
> Ahh, I knew I was missing something.
> 
> But I think it's fixable, with an additional rule:
> 
>  - when generating STATX_CHANGE_COOKIE, if the ctime matches the
> current time and the ctime counter is zero, set the ctime counter to
> 1.
> 
> That means that you will have *spurious* cache invalidations of such
> cached data after a reboot, but only for reads that happened right
> after the file was written.

Presumably it will also happen if the file gets kicked out of cache on
the server, since that will cause the I_VERSION_QUERIED flag and any
other in-memory metadata to be lost.

> 
> Now, it's obviously not unheard of to finish writing a file, and then
> immediately reading the results again.
> 
> But at least those caches should be somewhat limited (and the problem
> then only happens when the nfs server is rebooted).
> 
> I *assume* that the whole thundering herd issue with lots of clients
> tends to be for stable files, not files that were just written and
> then immediately cached?
> 
> I dunno. I'm sure there's still some thinko here.

Close-to-open cache consistency means that the client is usually
expected to check the change attribute (or ctime) on file close and
file open. So it is not uncommon for it to have to revalidate the cache
not long after finishing writing the file. Of course, it is rare to
have another client interject with another write to the same file just
a few microseconds after it was closed, however it is extremely common
for that sort of behaviour to occur with directories.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com