linux-kernel - [PATCH 0/6] Multi-hierarchy Process Containers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20061117191159.151894000@menage.corp.google.com>
Date:	Fri, 17 Nov 2006 11:11:59 -0800
From:	menage@...gle.com
To:	akpm@...l.org, pj@....com, sekharan@...ibm.com
Cc:	ckrm-tech@...ts.sourceforge.net, jlan@....com, simon.derr@...l.net,
	linux-kernel@...r.kernel.org, mbligh@...gle.com, winget@...gle.com,
	rohitseth@...gle.com
Subject: [PATCH 0/6]  Multi-hierarchy Process Containers

This is an update to my generic containers patch. The major change is
support for multiple hierarchies of containers (up to a limit
specified at build time).

- The mount options passed when mounting a container filesystem
  indicate the set of controllers/subsystems that are wanted in the
  hierarchy - e.g. "mount -t container -o cpuset,numtasks container /foo"

- Default is to try to mount all subsystems

- if a hierarchy with the requested set of subsystems already exists
  then its superblock is reused

- otherwise (as long as all the requested subsystems are currently not
  in use in any hierarchy) a new hierarchy is created.

- hierarchies with more than one container (i.e. with any children of
  the root container) persist even when unmounted;

- /proc/containers shows current hierarchy/subsystem details

- /proc/<pid>/container shows one line for each active hierarchy

Other changes include:

- ported to 2.6.19-rc5 

- per-subsystem/per-container state is no longer just a void * - it
  has some state maintained by the container framework (to handle
  moving subsystems in and out of hierarchies when they are created/released)

Note that this hasn't yet undergone intensive testing following the
multi-hierarchy introduction, but I wanted to get the basic idea out
for comments.

TODOs include:

- figuring out a nice way to handle release notifications now that
  there are multiple hierarchies

-------------------------------------

There have recently been various proposals floating around for
resource management/accounting subsystems in the kernel, including
Res Groups, User BeanCounters and others.  These all need the basic
abstraction of being able to group together multiple processes in an
aggregate, in order to track/limit the resources permitted to those
processes, and all implement this grouping in different ways.

Already existing in the kernel is the cpuset subsystem; this has a
process grouping mechanism that is mature, tested, and well documented
(particularly with regards to synchronization rules).

This patchset extracts the process grouping code from cpusets into a
generic container system, and makes the cpusets code a client of
the container system.

It also provides a very simple additional container subsystem to do
per-container CPU usage accounting; this is primarily to demonstrate
use of the container subsystem API, but is useful in its own right.

The change is implemented in five stages plus an additional example patch:

1) extract the process grouping code from cpusets into a standalone system

2) remove the process grouping code from cpusets and hook into the
   container system

3) convert the container system to present a generic multi-hierarchy
   API, and make cpusets a client of that API

4) add a simple CPU accounting container subsystem as an example

5) add support for fork/exit callbacks iff some subsystem is interested in them

6) example of implementing ResGroups and its numtasks controller over
   generic containers - not intended to be applied with this patch set

The intention is that the various resource management efforts can also
become container clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test out e.g. the ResGroups CPU controller in
 conjunction with the UBC memory controller

- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel

Signed-off-by: Paul Menage <menage@...gle.com>
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/