lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Jun 2017 11:35:28 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Waiman Long <longman@...hat.com>, Li Zefan <lizefan@...wei.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Ingo Molnar <mingo@...hat.com>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-mm@...ck.org, kernel-team@...com, pjt@...gle.com,
        luto@...capital.net, efault@....de
Subject: Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

Hello, Peter.

On Thu, Jun 01, 2017 at 05:10:45PM +0200, Peter Zijlstra wrote:
> I've not had time to look at any of this. But the question I'm most
> curious about is how cgroup-v2 preserves the container invariant.
> 
> That is, each container (namespace) should look like a 'real' machine.
> So just like userns allows to have a uid-0 (aka root) for each container
> and pidns allows a pid-1 for each container, cgroupns should provide a
> root group for each container.
> 
> And cgroup-v2 has this 'exception' (aka wart) for the root group which
> needs to be replicated for each namespace.

The goal has never been that a container must be indistinguishible
from a real machine.  For certain things, things simply don't have
exact equivalents due to sharing (memory stats or journal writes for
example) and those things are exactly why people prefer containers
over VMs for certain use cases.  If one wants full replication, VM
would be the way to go.

The goal is allowing enough container invariant so that appropriate
workloads can be contained and co-exist in useful ways.  This also
means that the contained workload is usually either a bit illiterate
w.r.t. to the system details (doesn't care) or makes some adjustments
for running inside a container (most quasi-full-system ones already
do).

System root is inherently different from all other nested roots.
Making some exceptions for the root isn't about taking away from other
roots but more reflecting the inherent differences - there are things
which are inherently system / bare-metal.

Thanks.

-- 
tejun

Powered by blists - more mailing lists