linux-kernel - Re: [PATCH -mm 5/7] add user namespace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <50EA0F5E-5C46-4395-A4AE-01702B82C51C@mac.com>
Date:	Sat, 15 Jul 2006 09:25:39 -0400
From:	Kyle Moffett <mrmacman_g4@....com>
To:	ebiederm@...ssion.com (Eric W. Biederman)
Cc:	Trond Myklebust <trond.myklebust@....uio.no>,
	Dave Hansen <haveblue@...ibm.com>,
	"Serge E. Hallyn" <serue@...ibm.com>,
	Cedric Le Goater <clg@...ibm.com>,
	linux-kernel@...r.kernel.org, Andrew Morton <akpm@...l.org>,
	Kirill Korotaev <dev@...nvz.org>, Andrey Savochkin <saw@...ru>,
	Herbert Poetzl <herbert@...hfloor.at>,
	Sam Vilain <sam.vilain@...alyst.net.nz>
Subject: Re: [PATCH -mm 5/7] add user namespace

On Jul 15, 2006, at 08:35:18, Eric W. Biederman wrote:
> Kyle Moffett <mrmacman_g4@....com> writes:
>> With NFS and the proposed superblock-sharing patches (necessary  
>> for  efficiency and other reasons I don't entirely understand),  
>> the  situation is worse:  A mount of server:/foo/bar on / in the  
>> bar virtual machine may get its superblock merged with a mount of  
>> server:/ foo/baz on / in the baz virtual machine.  If it's  
>> efficient to merge  those superblocks we should, and once again  
>> it's necessary to tie the  UID namespace to the vfsmount, not the  
>> superblock.
>
> I completely agree that pushing nameidata down into  
> generic_permission where we can use per mount properties in our  
> permission checks is ideal.  The benefit I see is just a small  
> increase in flexibility. So I don't really care either way.
>
> Currently there are several additional flags that could benefit  
> from a per vfsmount interpretation as well:  nosuid, noexec, nodev,  
> and readonly, how do we handle those?
>
> noexec is on the vfsmount.
> nosuid is on the vfsmount
> nodev  is on the vfsmount
> readonly is not on the vfsmount.
>
> The existing precedent is clearly in favor of putting this kind of  
> information on the vfsmount.  The read-only attribute seems to be  
> the only hold out.  If readonly has deep implications like no  
> journal replay it makes sense to keep it per mount.  Which  
> indicates we could nose a nowrite option to express the per  
> vfsmount property.

Well, speaking of that; there's been another thread recently that's  
splitting the properties of read-only between vfsmount and  
superblock.  So a read-only superblock implies read-only vfsmounts,  
but the following can create a read-only vfsmount for a writable  
superblock:

   mount --bind / /mnt/read-only-root
   mount -o ro,remount /mnt/read-only-root

So the readonly special case will go away.

> I hope the confusion has passed for Trond.  My impression was he  
> figured this was per process data so it didn't make sense any where  
> near a filesystem, and the superblock was the last place it should be.

One of the things I said earlier in this thread is that "Both  
filesystems _and_ processes should be first-class objects in any UID  
namespace".  In order to have sufficient access controls in the  
presence of _only_ a UID-namespace (as opposed to with full container  
isolation), you need to check against an object *and* the namespace  
in which it is located.  In some cases, the object is a file, which  
means that the inode, vfsmount, or superblock need a UID namespace  
reference.  Theoretically a you could implement per-file UID  
namespace pointers, but that would probably be incredibly  
inefficient.  IMHO, per-vfsmount gives the best flexibility and  
efficiency of the three.

In fact, it's strange to think about this in context with the rest of  
the namespaces that are being designed, but processes would  
ordinarily *not* have primary presence in a UID namespace if they  
weren't the target of UID-verified operations in and of themselves  
(EX: kill, ptrace, etc).  Otherwise they would just have a set of  
(namespace,UID,cap_flags) pairs to give them access to filesystems in  
specific uid namespaces.

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/