linux-kernel - Re: [RFC PATCH] file as directory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1HqrtF-0002Fb-00@dorka.pomaz.szeredi.hu>
Date:	Wed, 23 May 2007 16:32:37 +0200
From:	Miklos Szeredi <miklos@...redi.hu>
To:	viro@....linux.org.uk
CC:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org
Subject: Re: [RFC PATCH] file as directory

> > > 	* invalidation on unlink is still an open problem.
> > > 	* locking in final mntput() doesn't look nice; we probably need
> > > a new refcounting scheme for vfsmounts to make that work.  I have a variant
> > > that might work here (and make life much easier for expiry logics in
> > > automount/shared trees, which is what it had been initially proposed for),
> > 
> > Which variant?  We had that "detached subtrees" thing, is that it?
> 
> Umm...  It is related to detached subtrees, but I'm not sure if it is what
> you are thinking about.

I was thinking of a similar one by Mike Waychison.  It had the problem
of requiring a spinlock for mntget/mntput.  It was also different in
that it did not gradually dissolve detached trees, but kept them as
whole blobs until the last ref went away.

> Short version of the story: new counter (mnt_busy) that would be defined
> in the following way: the number of external references (not due to the
> vfsmount tree structure or from namespace to root) + the number of
> children that have non-zero ->mnt_busy.  And a per-vfsmount flag ("goner").
> 
> The rules for handling ->mnt_busy:
> 	* duplicating external reference: increment m->mnt_busy
> 	* getting from m to child: increment child->mnt_busy, if it went
> from 0 to non-zero - increment m->mnt_busy as well (that's done under
> vfsmount_lock, so we can safely check for zero here).
> 	* getting from m to parent: increment parent->mnt_busy.
> 	* dropping external reference: decrement m->mnt_busy; if it's still
> non-zero, we are done.  If it's zero, we are in for some work (and had
> acquired vfsmount_lock by atomic_dec_and_lock()).  Here's what we do:
> 		* go through ancestors, decrementing ->mnt_busy, until we
> 		  hit the root or get to one with ->mnt_busy staying
> 		  non-zero.
> 		* find the most remote ancestor that has zero ->mnt_busy
> 		  and is marked as goner (might be m itself).
> 		* if no such beast exists, we are done.
> 		* otherwise, detach the subtree rooted in that ancestor
> 		  from its parent (if any) and unhash its root (if hashed).

How will this work with copy_tree() and namespace duplication, which
currently walk the tree with only namespace_sem held?

> 		  Now there is no external references to any vfsmount in that
> 		  subtree.
> 		* now we can kill all vfsmounts in that subtree.
> 	* detaching m from parent: nothing; we trade a busy child of parent
> for new external reference to parent.
> 	* lazy umount: in addition to detaching everything from parents
> and dropping resulting external references to parents, mark everything
> in the subtree as goners.
> 	* normal umount: check ->mnt_busy *and* lack of children, detach,
> mark as goner, drop resulting external reference to parent.
> 	* fun new stuff - umount of intact subtree: detach the subtree from
> parent, do *not* dissolve it, mark everything in subtree as goners.  If
> something we mark as goner is not busy, we can kill it and all its descendents.
> The subtree will be shrinking as its pieces lose external references.
> 	* check for expirability: "we hold an external reference to m and
> m->mnt_busy is 1".  No need to look into children, etc.
> 	* your vfsmounts: simply mark them goners from the very beginning.
>  
> > > but it still doesn't kill the need to deal with invalidation.  And
> > > yes, NFS still needs it (and so do all network filesystems, really).
> > > The question of caching is related to that.
> > 
> > So what's so special about invalidation?  Why not just treat
> > dir-on-file mounts the same as any other ref on the dentry?
> 
> Because of the case of having something mounted in that subtree.  The
> current code doesn't even try to evict such stuff.  NFS *does*, but
> it's not in position to do that decently (not NFS fault, it's just that
> we don't have the data needed for it).
> 
> Note that one problem we used to have back then is gone - namely, per-namespace
> semaphores.  It's a global semaphore now, so we *can* do cross-namespace
> rogering of mount trees without that kind of locking horrors.
> 
> What we really need is "go through dentry subtree, try to evict everything
> we can, for anything that has stuff mounted on it go through all such
> vfsmounts and kick them and all their descendents out".  That's what should
> happen on invalidation.  From generic code, so that NFS wouldn't have to
> bother.
> 
> And _that_ is what we could call from ->unlink() on your inode - would take
> care of submounts.

OK, I'll digest this info.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/