linux-kernel - Re: [ckrm-tech] RFC: Memory Controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6599ad830610301014l1bf78ce8q998229483d055a90@mail.gmail.com>
Date:	Mon, 30 Oct 2006 10:14:44 -0800
From:	"Paul Menage" <menage@...gle.com>
To:	balbir@...ibm.com
Cc:	dev@...nvz.org, vatsa@...ibm.com, sekharan@...ibm.com,
	ckrm-tech@...ts.sourceforge.net, haveblue@...ibm.com,
	linux-kernel@...r.kernel.org, pj@....com, matthltc@...ibm.com,
	dipankar@...ibm.com, rohitseth@...gle.com
Subject: Re: [ckrm-tech] RFC: Memory Controller

On 10/30/06, Balbir Singh <balbir@...ibm.com> wrote:
>
> You'll also end up with per zone page cache pools for each zone. A list of
> active/inactive pages per zone (which will split up the global LRU list).

Yes, these are some of the inefficiencies that we're ironing out.

> What about the hard-partitioning. If a container/cpuset is not using its full
> 64MB of a fake node, can some other node use it?

No. So the granularity at which you can divide up the system depends
on how big your fake nodes are. For our purposes, we figure that 64MB
granularity should be fine.

> Also, won't you end up
> with a big zonelist?

Yes - but PaulJ's recent patch to speed up the zone selection helped
reduce the overhead of this a lot.

>
> Consider the other side of the story. lets say we have a shared lib shared
> among quite a few containers. We limit the usage of the inode containing
> the shared library to 50M. Tasks A and B use some part of the library
> and cause the container "C" to reach the limit. Container C is charged
> for all usage of the shared library. Now no other task, irrespective of which
> container it belongs to, can touch any new pages of the shared library.

Well, if the pages aren't mlocked then presumably some of the existing
pages can be flushed out to disk and replaced with other pages.

>
> What you are suggesting is to virtually group the inodes by container rather
> than task. It might make sense in some cases, but not all.

Right - I think it's an important feature to be able to support, but I
agree that it's not suitable for all situations.

>
> We could consider implementing the controllers in phases
>
> 1. RSS control (anon + mapped pages)
> 2. Page Cache control

Page cache control is actually more essential that RSS control, in our
experience - it's pretty easy to track RSS values from userspace, and
react reasonably quickly to kill things that go over their limit, but
determining page cache usage (i.e. determining which job on the system
is flooding the page cache with dirty buffers) is pretty much
impossible currently.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/