linux-kernel - Re: [RFC PATCH] file as directory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070523135140.GW4095@ftp.linux.org.uk>
Date:	Wed, 23 May 2007 14:51:40 +0100
From:	Al Viro <viro@....linux.org.uk>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	akpm@...ux-foundation.org, torvalds@...ux-foundation.org
Subject: Re: [RFC PATCH] file as directory

On Wed, May 23, 2007 at 03:01:38PM +0200, Miklos Szeredi wrote:
> Someone might think of a way to make those work with directories.
> Invisible directory entries, anyone?  </me ducks>

Not unless you manage to get working union-mount [*NOT* unionfs]
 
> > 	* invalidation on unlink is still an open problem.
> > 	* locking in final mntput() doesn't look nice; we probably need
> > a new refcounting scheme for vfsmounts to make that work.  I have a variant
> > that might work here (and make life much easier for expiry logics in
> > automount/shared trees, which is what it had been initially proposed for),
> 
> Which variant?  We had that "detached subtrees" thing, is that it?

Umm...  It is related to detached subtrees, but I'm not sure if it is what
you are thinking about.

Short version of the story: new counter (mnt_busy) that would be defined
in the following way: the number of external references (not due to the
vfsmount tree structure or from namespace to root) + the number of
children that have non-zero ->mnt_busy.  And a per-vfsmount flag ("goner").

The rules for handling ->mnt_busy:
	* duplicating external reference: increment m->mnt_busy
	* getting from m to child: increment child->mnt_busy, if it went
from 0 to non-zero - increment m->mnt_busy as well (that's done under
vfsmount_lock, so we can safely check for zero here).
	* getting from m to parent: increment parent->mnt_busy.
	* dropping external reference: decrement m->mnt_busy; if it's still
non-zero, we are done.  If it's zero, we are in for some work (and had
acquired vfsmount_lock by atomic_dec_and_lock()).  Here's what we do:
		* go through ancestors, decrementing ->mnt_busy, until we
		  hit the root or get to one with ->mnt_busy staying
		  non-zero.
		* find the most remote ancestor that has zero ->mnt_busy
		  and is marked as goner (might be m itself).
		* if no such beast exists, we are done.
		* otherwise, detach the subtree rooted in that ancestor
		  from its parent (if any) and unhash its root (if hashed).
		  Now there is no external references to any vfsmount in that
		  subtree.
		* now we can kill all vfsmounts in that subtree.
	* detaching m from parent: nothing; we trade a busy child of parent
for new external reference to parent.
	* lazy umount: in addition to detaching everything from parents
and dropping resulting external references to parents, mark everything
in the subtree as goners.
	* normal umount: check ->mnt_busy *and* lack of children, detach,
mark as goner, drop resulting external reference to parent.
	* fun new stuff - umount of intact subtree: detach the subtree from
parent, do *not* dissolve it, mark everything in subtree as goners.  If
something we mark as goner is not busy, we can kill it and all its descendents.
The subtree will be shrinking as its pieces lose external references.
	* check for expirability: "we hold an external reference to m and
m->mnt_busy is 1".  No need to look into children, etc.
	* your vfsmounts: simply mark them goners from the very beginning.
 
> > but it still doesn't kill the need to deal with invalidation.  And
> > yes, NFS still needs it (and so do all network filesystems, really).
> > The question of caching is related to that.
> 
> So what's so special about invalidation?  Why not just treat
> dir-on-file mounts the same as any other ref on the dentry?

Because of the case of having something mounted in that subtree.  The
current code doesn't even try to evict such stuff.  NFS *does*, but
it's not in position to do that decently (not NFS fault, it's just that
we don't have the data needed for it).

Note that one problem we used to have back then is gone - namely, per-namespace
semaphores.  It's a global semaphore now, so we *can* do cross-namespace
rogering of mount trees without that kind of locking horrors.

What we really need is "go through dentry subtree, try to evict everything
we can, for anything that has stuff mounted on it go through all such
vfsmounts and kick them and all their descendents out".  That's what should
happen on invalidation.  From generic code, so that NFS wouldn't have to
bother.

And _that_ is what we could call from ->unlink() on your inode - would take
care of submounts.

Note that I'm not all that happy about this scheme; we might make it work,
but I still want to see a good use scenario for that kind of stuff.
Invalidation logics is a separate story - it's simply needed for existing
stuff; that area sucks *badly*, regardless of adding these hybrid objects.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/