linux-kernel - Re: How to manage shared persistent local caching (FS-Cache) with NFS?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <16896.1197075135@redhat.com>
Date:	Sat, 08 Dec 2007 00:52:15 +0000
From:	David Howells <dhowells@...hat.com>
To:	Chuck Lever <chuck.lever@...cle.com>
Cc:	dhowells@...hat.com, Peter Staubach <staubach@...hat.com>,
	Trond Myklebust <trond.myklebust@....uio.no>,
	nfsv4@...ux-nfs.org, linux-kernel@...r.kernel.org
Subject: Re: How to manage shared persistent local caching (FS-Cache) with NFS?

Chuck Lever <chuck.lever@...cle.com> wrote:

> Why not encode the local mounted-on directory in the key?

Can't.  Namespaces.  chroot.

> Meaning your cache is at quota all the time, and to continue operation it must
> eject items constantly.

I've thought about that, thank you.  Go and read the documentation.  There's
configurable hysteresis in the culling algorithm.

> This is a scenario where it pays to cache the read-mostly items on disk, and
> leave the frequently changing items in memory.

Currently any file which is opened for writing is automatically ejected from
the cache.

> The economics of disk caches is different than memory caches.  Disk caches are
> much larger and cheaper, but their performance tanks when  they have to track
> frequently changing files.  Memory caches are  smaller, but tracking
> frequently changing data is only a little more  expensive than tracking data
> that doesn't change often.

I'm aware of all that.  My OLS slides and paper can be found here:

	http://people.redhat.com/~dhowells/fscache/fscache-ols2006.odp
	http://people.redhat.com/~dhowells/fscache/FS-Cache.pdf

Lots of small files also hurt worse than fewer big files in some ways.  Lots
more metadata in the cache.  On the other hand, fragmentation is less of a
problem.

Anyway, this is straying off the main topic.

> I think it's key to preventing FS-cache from making performance worse in many
> common scenarios.

Perhaps.  The problem is that NFS doesn't know what the access pattern on a
file is expected to be.  I've been asked to provide fine-grained cache
controls (perhaps directory level), but Al Viro was, erm, luke warm in his
reception to that idea.

Gathering statistical data dynamically has performance penalties of its own:-/

> Disconnected operation for NFS is fraught with challenges.  Access to data on
> servers is traditionally gated by the client's IP address,  for example.  The
> client may disconnect from the network, then  reconnect using a different
> address where suddenly all of its  accesses are rebuffed.

Agreed, but isn't that one of the design goals for NFS4?

It's also something of interest to other netfs's that might want to use
FS-Cache.  This isn't an NFS-only facility.

> NFS servers, not clients, traditionally determine the file's mtime and ctime,
> and its file handle.  So file updates and file creation  become problematic.
> The client has to reconcile the server's file  handle, for files created
> offline, with its own when reconnecting.

Yes.  Basically it's a major can of major worms.  Doesn't stop people wanting
it, though.

> And, for disconnected operation, the cache is required to contain every item
> from the remote.  You can't just drop items from the cache  because they are
> inconvenient.

Yes.  That's what pinning and reservations are for.

Currently, support for disconnected operations is an idea I'd like to have,
but is otherwise mostly non-existent.

> That something might be the pathname of the mounted-on directory or of the
> file itself.

See above.

> Yes, they do.  The combination of mount options and mounted-on directory (or
> local pathname to the file) gives you a unique identity  for that view.

See above.

> So an item is cached in memory until space becomes available in the disk
> cache?

The item isn't considered for caching until space becomes available in the
disk cache.  It's put on a queue for potential caching, but won't actually be
cached if it gets discarded from the icache or pagecache before being cached.

It's unfortunate, but with a fast network you can download data faster than
you can make space in the cache.  unlink() and rmdir() are (a) slow and (b)
synchronous.  Each unlink() or rmdir() operation requires a task to perform
it, and that task is committed until the op finishes.

I could actually improve cachefilesd (the userspace cache culler) by giving it
multiple threads.

However, having cachefilesd doing lots of parallel synchronous, journalled
disk ops hurts performance in other ways I've noticed:-/

Again, hysteresis is available.  We stop writing stuff into the cache beyond
a limit until the free space drops sufficiently below that limit that we've
got a good go at writing a load new stuff, rather than just a block here and a
block there.

It's all very icky, and depends as much on the filesystem underlying the cache
(ext3 for example) and *its* configuration, as the characteristics of the
netfs and the network link.  It's all about compromise.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/