linux-kernel - Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c7af263d-f265-7feb-8d40-0b0d773f24bf@gmail.com>
Date:   Thu, 11 Oct 2018 12:48:43 +0100
From:   Alan Jenkins <alan.christopher.jenkins@...il.com>
To:     David Howells <dhowells@...hat.com>
Cc:     viro@...iv.linux.org.uk, torvalds@...ux-foundation.org,
        ebiederm@...ssion.com, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, mszeredi@...hat.com
Subject: Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE
 [ver #12]

On 11/10/2018 10:17, David Howells wrote:
> Alan Jenkins <alan.christopher.jenkins@...il.com> wrote:
>
>> # unshare --mount=private_mnt/child_ns --propagation=shared ls -l /proc/self/ns/mnt
> I think the problem is that the mount of the nsfs object done by unshare here
> pins the new mount namespace - but doesn't add the namespace's contents into
> the mount tree, so the mount struct cycle-detection code is bypassed.
>
> I think it's fine for all other namespaces, just not the mount namespace.
>
> It looks like this bug might theoretically exist upstream also, though I don't
> think there's any way to actually effect it given that mount() doesn't take a
> dirfd argument.
>
> The reason that you can do this with open_tree()/move_mount() is that it
> allows you to create a mount tree (OPEN_TREE_CLONE) that has no namespace
> assignment, pass it through the namespace switch and then attach it inside the
> child namespace.  The cross-namespace checks in do_move_mount() are bypassed
> because the root of the newly-cloned mount tree doesn't have one.
>
> Unfortunately, just searching the newly-cloned mount tree for a conflicting
> nsfs mount doesn't help because the potential loop could be hidden several
> levels deep.
>
> I think the simplest solution is to either reject a request for
> open_tree(OPEN_TREE_CLONE) if there are any nsfs objects in the source tree,
> or to just not copy said objects.
>
> David

Very clearly written, thank you.  Hum, your solution would mean 
open_tree(OPEN_TREE_CLONE) + move_mount() is not equivalent to the 
current `mount --rbind` :-(.  That does not fit the current patch 
description.

It sounds like you're under-estimating how we can use mnt_ns->seq (as is 
currently used in mnt_ns_loop()).  Or maybe I am over-estimating it :).

In principle, it should suffice for attach_recursive_mount() to check 
the NS sequence numbers of the NS files which are mounted. You can't 
hide the loop at a deeper level inside the NS, because of the existing 
mnt_ns_loop() check.

I think mnt_ns_loop() works 100% correctly upstream, and there is no 
memory leak bug there.  You can pass a mount NS fd between processes in 
arbitrary namespaces, and you can mount it with "mount --no-canonicalize 
--bind /proc/self/fd/3 /other_ns".  But mnt_ns_loop() will only allow 
the mount when the other NS is newer than your own mount namespace.

Upstream also covers mount propagation (and CLONE_NEWNS), by simply not 
propagating mounts of mount NS files.  ( See commit 4ce5d2b1a8fd "vfs: 
Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces" / 
https://unix.stackexchange.com/questions/473717/what-code-prevents-mount-namespace-loops-in-a-more-complex-case-involving-mount-propagation 
)

I think it is more a question of taste :-).  Would it be acceptable to 
prune the tree (or fail?) in move_mount() (and also `mount --move`, if 
you [ab]use it like I did) ?

I suspect we should prefer your solution.  It is clearly simpler, and I 
don't know that anyone really uses `mount --rbind` to clone trees of 
mount NS files.

Either way, I suggest we take care to say whether `mount --rbind` and 
`mount --bind` can be implemented using open_tree() + move_mount(), or 
whether we think it might be undesirable.  (E.g. because someone might 
read the current commit message, and desire to implement `mount 
--bind,ro` atomically, if/when we also have mount_setattr() ).

Regards

Alan


> ---
>
> Test script:
>
> 	mount -t tmpfs none /a
> 	mount --make-shared /a
> 	cd /a
> 	mkdir private_mnt
> 	mount -t tmpfs xxx private_mnt
> 	mount --make-private private_mnt
> 	touch private_mnt/child_ns
> 	unshare --mount=private_mnt/child_ns --propagation=shared \
> 	    ls -l /proc/self/ns/mnt
> 	findmnt
>
> 	~/open_tree 3</a/private_mnt 3 \
> 	    nsenter --mount=/a/private_mnt/child_ns \
> 	    sh -c '~/move_mount 4</mnt'
>
> 	grep Shmem: /proc/meminfo
> 	dd if=/dev/zero of=/a/private_mnt/bigfile bs=1M count=10
>
> 	umount -l /a/private_mnt/
> 	grep Shmem: /proc/meminfo