linux-kernel - Re: Security issues with local filesystem caching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1163599542.5691.84.camel@lade.trondhjem.org>
Date:	Wed, 15 Nov 2006 09:05:42 -0500
From:	Trond Myklebust <trond.myklebust@....uio.no>
To:	David Howells <dhowells@...hat.com>
Cc:	Stephen Smalley <sds@...ho.nsa.gov>,
	Karl MacMillan <kmacmill@...hat.com>, jmorris@...ei.org,
	chrisw@...s-sol.org, selinux@...ho.nsa.gov,
	linux-kernel@...r.kernel.org, aviro@...hat.com
Subject: Re: Security issues with local filesystem caching

On Tue, 2006-11-14 at 19:22 +0000, David Howells wrote:
> Trond Myklebust <trond.myklebust@....uio.no> wrote:
> 
> > > Avoiding context switches aren't the main problem; avoiding serialisation
> > > is.
> > 
> > Why? It is a backing cache. The only case where serialisation ought to
> > bother you is the case where the client has to invalidate the cache due
> > to a server-side update of the file.
> 
> Cache invalidation is not so much of a problem as at that point we know exactly
> where whatever it is that we're invalidating is, and if it's a big object we
> just move it somewhere for the userspace daemon to splat.
> 
> The serialisation problem is that if we put cache lookup in its own thread,
> then in effect every open[*] of an NFS file, AFS file, whatever, will be
> serialised.
>
> It almost certainly wouldn't matter if what we did was to asynchronously look
> up the cache cookie for a file.  In the common case, an open(O_RDONLY) syscall
> is followed almost immediately by a read(), so there's not much to be gained
> from asynchronising things as the cache cookie has to be available by the time
> we come to process the read, but we can't get the cache cookie before
> completing the server checks made by open, as we need the coherency data before
> attempting to acquire a cookie.

No. All you should need is the result of the lookup(). The coherency
data needs to be checked against an eventual existing CacheFiles entry
during the call to read(), not before.

> The serialisation would stem from having to do several synchronous filesystem
> ops for each cache lookup, but only having one thread in which to do them.
> Okay, I could have several worker threads, but why?  Each process attempting to
> access the cache provides me with a suitable worker thread, and then I can have
> as many as there are tasks on the system.

Umm...because the former is a model which actually fits your security
requirements (i.e. one privileged daemon gets lookup()+open()+mkdir()...
rights on the CacheFiles partition), whereas the latter is not (all
tasks need lookup()+open()+mkdir().... privileges)?

> [*] Note that for NFS I've now incorporated a patch from Steve Dickson to
> acquire file cookies on the NFS open() file op, rather than during iget because
> NFS readdir calls iget.
> 
> > Once the RPC calls have been launched, the process returns to the VM
> > layer and just waits for the next page to be unlocked. It never returns
> > to the filesystem layer. So where are you using the process context to
> > write out the cached data?
> 
> What do you mean by "write out the cached data"?  Do you mean write the data to
> the cache?
> 
> If so, that'd be nfs_readpage_to_fscache() as called from nfs_readpage_sync()
> or nfs_readpage_release().

nfs_readpage_release() is an rpciod context, _not_ a user thread
context.

> That calls fscache_write_page() which calls cachefiles_write_page() which calls
> generic_file_buffered_write_one_kernel_page().
> 
> That last copies the data into the pagecache attached to an ext3 inode to be
> written out (hopefully) asynchronously.
> 
> However, that may do other disk accesses, I suppose, as it calls
> prepare_write() and commit_write() on ext3.

Which would generally be forbidden under the rpciod context, BTW, since
they imply calls to generic memory allocation (== nasty tricksy deadlock
potential, since rpciod may be called upon to help write out NFS pages
via shrink_page_list and friends).

> I could try and make it asynchronous, but that means more overhead in other
> ways:-(  I presume this will then sometimes be running in rpciod context?
> 
> > The cookie lookups need to be synchronous, but why would the file
> > creation need to be synchronous? Creating the cachefs file and waiting
> > on that to complete etc are all utterly useless activities as far as
> > satisfying the user request for data goes. Just start the process of
> > creating a backing file, and then get on with the actual syscall.
> 
> vfs_mkdir() is synchronous.  vfs_create() is synchronous.  vfs_[sg]etxattr is
> synchronous.  Lookup is synchronous.

All of them are synchronous as far as accessing the remote filesystem is
concerned. Why would the user process care if a privileged daemon has
completed the shadow mkdir() or create() on the CacheFiles system or
not?

> Yes, I could make them all asynchronous, but it'd be a lot more work, and
> mostly unnecessary, and I'd probably have to fight down lots of objections.
> 
> 
> Remember: in the common case, open(O_RDONLY) is going to be followed quickly by
> a read().  I suppose there may be an intervening stat() and malloc(), but even
> so...

Which is why lookup() + open() + read() needs to be fast in the case
where you have a CacheFiles hit. It does not justify mkdir, create, etc
being fast, nor does it justify the open() + read() part needing to be
fast in the case of a CacheFiles miss.

  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/