linux-kernel - Re: [ckrm-tech] [RFC] Resource Management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 31 Oct 2006 20:39:27 -0800 (PST)
From:	David Rientjes <rientjes@...washington.edu>
To:	Paul Menage <menage@...gle.com>
cc:	Paul Jackson <pj@....com>, dev@...nvz.org, vatsa@...ibm.com,
	sekharan@...ibm.com, ckrm-tech@...ts.sourceforge.net,
	balbir@...ibm.com, haveblue@...ibm.com,
	linux-kernel@...r.kernel.org, matthltc@...ibm.com,
	dipankar@...ibm.com, rohitseth@...gle.com
Subject: Re: [ckrm-tech] [RFC] Resource Management - Infrastructure choices

On Mon, 30 Oct 2006, Paul Menage wrote:

> More or less. More concretely:
> 
> - there is a single hierarchy of process containers
> - each process is a member of exactly one process container
> 
> - for each resource controller, there's a hierarchy of resource "nodes"
> - each process container is associated with exactly one resource node
> of each type
> 
> - by default, the process container hierarchy and the resource node
> hierarchies are isomorphic, but that can be controlled by userspace.
> 

This approach appears to be the most complete and extensible 
implementation of containers for all practical uses.  Not only can you use 
these process containers in conjunction with your choice of memory 
controllers, network controllers, disk I/O controllers, etc, but you can 
also pick and choose your own modular controller of choice to meet your 
needs.

So here's our three process containers, A, B, and C, with our tasks m-t:

	-----A-----	-----B-----	-----C-----
	|    |    |     |    |    |     |    |
	m    n    o	p    q    r	s    t

Here's our memory controller groups D and E and our containers set within 
them:

	-----D-----	-----E-----
	|         |	|
	A         B	C

 [ My memory controller E is for my real-time processes so I set its
   attributes appropriately so that it never breaks. ]

And our network controller groups F, G, and H:

	-----F-----	-----G-----
			|         |
		   -----H-----    C
		   |         |
		   A	     B

 [ I'm going to leave my network controller F open for my customer's
   WWW browsing, but nobody is using it right now. ]

I choose not to control disk I/O so there is change from current behavior 
for any of the processes listed above.

There's two things I notice about this approach (my use of the word 
"container" refers to the process containers A, B, and C; my use of the 
word "controller" refers to memory, disk I/O, network, etc controllers):

 - While the process containers are only single-level, the controllers are
   _inherently_ hierarchial just like a filesystem.  So it appears that
   the manipulation of these controllers would most effectively be done
   from userspace with a filesystem approach.  While it may not be served
   by forcing CONFIG_CONFIGFS_FS to be enabled, I observe no objection to
   giving it its own filesystem capability, apart from configfs, through 
   the kernel.  The filesystem manipulation tools that everybody is
   familiar with makes the implementation of controllers simple and, more
   importantly, easier to _use_.

 - The process containers will need to be setup as desired following
   boot.  So if the current approach of cpusets is used, where the
   functionality is enabled on mount, all processes will originally belong
   to the default container that encompasses the entire system.  Since
   each process must belong to only one process container as per Paul
   Menage's proposal, a new container will need to be created and
   processes _moved_ to it for later use by controllers.  So it appears
   that the manipulation of containers would most effectively be done from
   userspace by a syscall approach.

In this scenario, it is not necessary for network controller groups F and 
G above to be limited (or guaranteed) to 100% of our network load.  It is 
quite possible that we do not assign every container to a network 
controller so that they receive the remainder of the bandwidth that is not 
already attributed to F and G.  The same is true with any controller.  Our 
controllers should only seek the limit or guarantee certain amount of 
resources, not force each system process to be a member of one group or 
another to receive the resources.

Two questions also arise:

 - Why do I need to create (i.e. mount the filesystem) the container in
   the first place?  Since the use of these containers are entirely on the 
   shoulders of the optional controllers, there should be no interference 
   with current behavior if I choose not to use any controller.  So why 
   not take the approach that NUMA did whereas if we're on an UMA machine, 
   all of memory belongs to a node 0?  In our case, all processes will 
   inherently belong to a system-wide container similar to procfs.  In
   fact, procfs is how this can be implemented apart from configfs
   following the criticism from UBC.

 - How is forking handled with the various controllers?  Do child 
   processes automatically inherit all the controller groups of its
   parent?  If not (or if its dependant on a user-configured attribute
   of the controller), what happens when I want forked processes to
   belong to a new network controller group from container A in the
   illustration above?  Certaintly that new controller cannot be
   created as a sibling of F and G; and determining the limit on
   network for a third child of H would be non-trivial because then
   the network resources allocated to A and B would be scaled back
   prehaps in an undesired manner.

So the container abstraction looks appropriate for a syscall interface 
whereas a controller abstraction looks appropriate for a filesystem 
interface.  If Paul Menage's proposal of above is adopted, it seems like 
the design and implementation of containers is the first milestone; how 
far does the current patchset get us to what is described above?  Does it 
still support a hierarchy just like cpusets?

And following that, it seems like the next milestone would be to design 
the different characteristics that the various modular controllers could 
support such as notify_on_release, limits/guarantees, behavior on fork, 
and privileges.

		David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/