linux-kernel - Re: [RFC][PATCH 0/9] Make containers kernel objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2446.1495551216@warthog.procyon.org.uk>
Date:   Tue, 23 May 2017 15:53:36 +0100
From:   David Howells <dhowells@...hat.com>
To:     Aleksa Sarai <asarai@...e.de>
Cc:     dhowells@...hat.com,
        James Bottomley <James.Bottomley@...senPartnership.com>,
        trondmy@...marydata.com, mszeredi@...hat.com,
        linux-nfs@...r.kernel.org, jlayton@...hat.com,
        Linux Containers <containers@...ts.linux-foundation.org>,
        linux-kernel@...r.kernel.org, viro@...iv.linux.org.uk,
        linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org,
        ebiederm@...ssion.com
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects

Aleksa Sarai <asarai@...e.de> wrote:

> >> The reason I think this is necessary is that the kernel has no idea
> >> how to direct upcalls to what userspace considers to be a container -
> >> current Linux practice appears to make a "container" just an
> >> arbitrarily chosen junction of namespaces, control groups and files,
> >> which may be changed individually within the "container".
> 
> Just want to point out that if the kernel APIs for containers massively
> change, then the OCI will have to completely rework how we describe containers
> (and so will all existing runtimes).
> 
> Not to mention that while I don't like how hard it is (from a runtime
> perspective) to actually set up a container securely, there are undoubtedly
> benefits to having namespaces split out. The network namespace being separate
> means that in certain contexts you actually don't want to create a new network
> namespace when creating a container.

Yep, I quite agree.

However, certain things need to be made per-net namespace that *aren't*.  DNS
results, for instance.

As an example, I could set up a client machine with two ethernet ports, set up
two DNS+NFS servers, each of which think they're called "foo.bar" and attach
each server to a different port on the client machine.  Then I could create a
pair of containers on the client machine and route the network in each
container to a different port.  Now there's a problem because the names of the
cached DNS records for each port overlap.

Further, the NFS idmapper needs to be able to direct its calls to the
appropriate network.

> I had some ideas about how you could implement bridging in userspace (as an
> unprivileged user, for rootless containers) but if you can't join namespaces
> individually then such a setup is not practically possible.

I'm not proposing to take away the ability to arbitrarily set the namespaces
in a container.  I haven't implemented it yet, but it was on the to-do list:

 (7) Directly set a container's namespaces to allow cross-container
     sharing.

David