linux-kernel - Re: [PATCH] vfs namespace: Don't assume mount namespace has valid root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120314205809.GJ23916@ZenIV.linux.org.uk>
Date:	Wed, 14 Mar 2012 20:58:09 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Sasha Levin <levinsasha928@...il.com>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	davej@...hat.com
Subject: Re: [PATCH] vfs namespace: Don't assume mount namespace has valid
 root

On Wed, Mar 14, 2012 at 08:10:48PM +0000, Al Viro wrote:
> On Wed, Mar 14, 2012 at 04:58:30PM -0400, Sasha Levin wrote:
> > This patch fixes the assumption that a mnt namespace will always have a valid
> > root object.
> 
> It's not an assumption, it's an invariant that should hold unless you have
> run into a bug somewhere.
> 
> Instances of struct mnt_namespace should *all* come from alloc_mnt_ns().
> There are only two callers - dup_mnt_namespace() and create_mnt_ns().
> The latter will assign non-NULL vfsmount to ->root or die NULL pointer
> dereference in
>                 mnt->mnt_ns = new_ns;
> The former will either assign non-NULL to ->root or kfree() mnt_namespace
> before anyone can see it.
> 
> And nothing should modify ->root after that assignment for as long as
> the instance of struct mnt_namespace is allocated.
> 
> Mind explaining how have you managed to get mnt_namespace with NULL ->root
> passed to dup_mnt_ns()?

As the matter of fact, all of the above is easily verified; very few places
see the internals of struct mnt_namespace (defined in fs/mount.h, included
only in fs/dcache.c, fs/fhandle.c, fs/namei.c,
fs/notify/{fanotify/fanotify_user.c,fsnotify.c,vfsmount_mark.c},
and fs/pnode.h, which is included by fs/namespace.c, fs/pnode.c and
fs/proc_namespace.c).  Nothing else knows what size the damn thing is,
nevermind the actual layout.

The lifetime rules are also simple and easy to verify: it's a plain refcount.
No direct manipulators, everything happens via get_mnt_ns()/put_mnt_ns().
As far as the outside cares,
	nsproxy->mnt_ns contributes to refcount; copy_mnt_ns() always
returns a new reference, either to exisiting instance or to freshly allocated
one.  put_mnt_ns() needs to be called on the other side...
	opened /proc/*/{mounts,mountinfo,mountstats} contributes to refcount
for as long as it's opened.

This is it; we actually end up acquiring one reference too many to initial
mnt_namespace, but that's not going to cause such effect.

The structure containing struct vfsmount has a pointer to struct mnt_namespace
in it.  That one is protected by namespace_sem, does *not* contribute to
refcount and is cleared when vfsmount (OK, struct mount containing it)
gets removed from namespace's ->list and set when it gets placed there.
Anything still on mnt_ns->list will get kicked out of there by umount_tree()
call from put_mnt_ns(), so those references can't outlive the instance
they point to.

And that's all pointers to mnt_namespace that ever exist, aside of function
arguments and local variables.  I'm not saying that I couldn't have possibly
fucked it up, but from rereading that code it doesn't look like we could
end up with dangling pointers to already freed instances...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/