linux-kernel - Re: Linux-VServer example results for sharing vs. separate mappings ...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070324202951.38f7187a.akpm@linux-foundation.org>
Date:	Sat, 24 Mar 2007 20:29:51 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Herbert Poetzl <herbert@...hfloor.at>
Cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	containers@...ts.osdl.org, linux-kernel@...r.kernel.org,
	Dave Hansen <hansendc@...ibm.com>
Subject: Re: Linux-VServer example results for sharing vs. separate mappings
 ...

On Sun, 25 Mar 2007 04:21:56 +0200 Herbert Poetzl <herbert@...hfloor.at> wrote:

> > a) slice the machine into 128 fake NUMA nodes, use each node as the
> >    basic block of memory allocation, manage the binding between these
> >    memory hunks and process groups with cpusets.
> 
> 128 sounds a little small to me, considering that we
> already see 300+ Guests on older machines ....
> (or am I missing something here?)

Yes, you're missing something very significant.  I'm talking about resource
management (ie: partitioning) and you're talking about virtual servers. 
They're different applications, with quite a lot in common.

For resource management, a few fives or tens of containers is probably an
upper bound.

An impementation needs to address both requirements.

> >    This is what google are testing, and it works.
> > 
> > b) Create a new memory abstraction, call it the "software zone",
> >    which is mostly decoupled from the present "hardware zones". Most of
> >    the MM is reworked to use "software zones". The "software zones" are
> >    runtime-resizeable, and obtain their pages via some means from the
> >    hardware zones. A container uses a software zone.
> > 
> > c) Something else, similar to the above.  Various schemes can be
> >    envisaged, it isn't terribly important for this discussion.
> 
> for me, the most natural approach is the one with 
> the least impact and smallest number of changes
> in the (granted quite complex) system: leave 
> everything as is, from the 'entire system' point
> of view, and do adjustments and decisions with the
> additional Guest/Context information in mind ...
> 
> e.g. if we decide to reclaim pages, and the 'normal'
> mechanism would end up with 100 'equal' candidates,
> the Guest badness can be a good additional criterion
> to decide which pages get thrown out ...
> 
> OTOH, the Guest status should never control the
> entire system behaviour in a way which harms the
> overall performance or resource efficiency

On the contrary - if one container exceeds its allotted resource, we want
the processes in that container to bear the majority of the cost of that. 
Ideally, all of the cost.

> 
> > All doable, if we indeed have a demonstrable problem
> > which needs to be addressed.
> 
> all in all I seem to be missing the 'original problem'
> which basically forces us to do all those things you
> describe instead of letting the Linux Memory System
> work as it works right now and just get the accounting
> right ...

The VM presently cannot satisfy resource management requirements, because
piggy activity from one job will impact the performance of all other jobs.

> > > note that the 'frowned upon' accounting Linux-VServer
> > > does seems to work for those cases quite fine .. here
> > > the relevant accounting/limits for three guests, the
> > > first two unified and started in strict sequence, the
> > > third one completely separate
> > > 
> > > Limit	 current     min/max	      soft/hard		hits
> > > VM:	   41739       0/   64023       -1/      -1	     0
> > > RSS:	    8073       0/    9222       -1/      -1	     0
> > > ANON:	    3110       0/    3405       -1/      -1	     0
> > > RMAP:	    4960       0/    5889       -1/      -1	     0
> > > SHM:	    7138       0/    7138       -1/      -1	     0
> > >                                     
> > > Limit	 current     min/max	      soft/hard		hits
> > > VM:	   41738       0/   64163       -1/      -1	     0
> > > RSS:	    8058       0/    9383       -1/      -1	     0
> > > ANON:	    3108       0/    3505       -1/      -1	     0
> > > RMAP:	    4950       0/    5912       -1/      -1	     0
> > > SHM:	    7138       0/    7138       -1/      -1	     0
> > >                                     
> > > Limit	 current     min/max	      soft/hard		hits
> > > VM:	   41738       0/   63912       -1/      -1	     0
> > > RSS:	    8050       0/    9211       -1/      -1	     0
> > > ANON:	    3104       0/    3399       -1/      -1	     0
> > > RMAP:	    4946       0/    5885       -1/      -1	     0
> > > SHM:	    7138       0/    7138       -1/      -1	     0
> > 
> > Sorry, I tend to go to sleep when presented with rows and rows of
> > numbers. Sure, it's good to show the data but I much prefer it if the
> > sender can tell us what the data means: the executive summary.
> 
> sorry, I'm more the technical person and I hate
> 'executive summaries' and similar stuff, but the
> message is simple and clear: accouting works even
> for shared/unified guests, all three guests show
> reasonably similar values ...

I don't see "accounting" as being useful for resource managment.  I mean,
so we have a bunch of numbers - so what?

The problem is: what do we do when the jobs in a container exceed their
allotment?

With zone-based physical containers we already have pretty much all the
accounting we need, in the existing per-zone accounting.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/