linux-kernel - Re: user namespace and fully visible proc and sys mounts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160307034516.GA11489@mail.hallyn.com>
Date:	Sun, 6 Mar 2016 21:45:16 -0600
From:	"Serge E. Hallyn" <serge@...lyn.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"Serge E. Hallyn" <serge@...lyn.com>,
	Serge Hallyn <serge.hallyn@...ntu.com>,
	Seth Forshee <seth.forshee@...onical.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Stéphane Graber <stgraber@...ntu.com>
Subject: Re: user namespace and fully visible proc and sys mounts

On Sun, Mar 06, 2016 at 06:24:23PM -0800, Andy Lutomirski wrote:
> On Mar 6, 2016 2:03 PM, "Eric W. Biederman" <ebiederm@...ssion.com> wrote:
> >
> > "Serge E. Hallyn" <serge.hallyn@...ntu.com> writes:
> >
> > > Hi,
> > >
> > > So we've been over this many times...  but unfortunately there is more
> > > breakage to report.  Regular privileged and unprivileged containers
> > > work all right for us.  But running an unprivileged container inside a
> > > privileged container is blocked.
> > >
> > > When creating privileged containers, lxc by default does a few things:
> > > it mounts some fuse.lxcfs files over procfiles include /proc/meminfo and
> > > /proc/uptime.  It mounts proc rw but /proc/sysrq-trigger ro as well as
> > > moves /proc/sys/net out of the way, bind-mounts /proc/sys readonly
> > > (because this container is not in a user namespace) then moves
> > > /proc/sys/net back.  Finally it mounts sys ro but bind-mounts
> > > /sys/devices/virtual/net as writeable.
> > >
> > > If any of these are left enabled, unprivileged containers can't be
> > > started.  If all are disabled, then they can be.
> > >
> > > Can we find a way to make these not block remounts in child user
> > > namespaces?  A boot flag, a procfs and sysfs mount option, a sysctl?
> >
> > Are any of these overmounts done for the purpose of security?  It
> > appears the /proc/sys and /sys mounts being made read-only is for that
> > purpose.
> >
> > If none of the mounts are for secuirty the easy solution that works
> > today is to also mount /proc and /sys somewhere else in your container
> > so that the permission check for mounting a new copy passes.
> 
> Can we use the big hammer approach on /proc/sys?  Specifically, what
> if we made it so that /proc mounts created in a non-root namespace
> *only* see things that are scoped to the active namespaces, and only
> those over which the mounter has capabilities?  We could have mount
> options for this.

Of course the problem is precisely non-user-namespaced containers which
do own and have capabilities over the /proc/sys/files.  For user-namespaced
containers /proc/sys/ isn't really an issue.

Better namespacing of sysctls and maybe some way to say "I relinquish
the ability to update *those* sysctls for myself and all children" could
help.

> /proc/sys utterly sucks for namespaces things.  So does the uid_map
> and similar crap.  The API is simply awful.
> 
> On a related note, can we *please* find a way to constrain namespace
> creation in a way that might satisfy the RHEL crowd?
> 
> >
> > That said /proc/sys appears to be a show stopper in this scheme.  As the
> > root of your privileged container can enter your unprivileged container
> > it can bypass your read-only /proc/sys by mounting a new copy of proc if
> > we allow the relaxation you are requesting.
> >
> > Therefore the only choice on the table (and I don't have a clue how
> > realistic it is) is to have a variant of proc with just files describing
> > processes.  Call it processfs.  That would not need the current
> > restrictions.
> >
> > As for sysfs I am drawing a blank about what might be possible.
> 
> Lovely.  Yet another vaguely-namespaced thing in a pseudo-filesystem.
> 
> --Andy

`