lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830610301014l1bf78ce8q998229483d055a90@mail.gmail.com>
Date:	Mon, 30 Oct 2006 10:14:44 -0800
From:	"Paul Menage" <menage@...gle.com>
To:	balbir@...ibm.com
Cc:	dev@...nvz.org, vatsa@...ibm.com, sekharan@...ibm.com,
	ckrm-tech@...ts.sourceforge.net, haveblue@...ibm.com,
	linux-kernel@...r.kernel.org, pj@....com, matthltc@...ibm.com,
	dipankar@...ibm.com, rohitseth@...gle.com
Subject: Re: [ckrm-tech] RFC: Memory Controller

On 10/30/06, Balbir Singh <balbir@...ibm.com> wrote:
>
> You'll also end up with per zone page cache pools for each zone. A list of
> active/inactive pages per zone (which will split up the global LRU list).

Yes, these are some of the inefficiencies that we're ironing out.

> What about the hard-partitioning. If a container/cpuset is not using its full
> 64MB of a fake node, can some other node use it?

No. So the granularity at which you can divide up the system depends
on how big your fake nodes are. For our purposes, we figure that 64MB
granularity should be fine.

> Also, won't you end up
> with a big zonelist?

Yes - but PaulJ's recent patch to speed up the zone selection helped
reduce the overhead of this a lot.

>
> Consider the other side of the story. lets say we have a shared lib shared
> among quite a few containers. We limit the usage of the inode containing
> the shared library to 50M. Tasks A and B use some part of the library
> and cause the container "C" to reach the limit. Container C is charged
> for all usage of the shared library. Now no other task, irrespective of which
> container it belongs to, can touch any new pages of the shared library.

Well, if the pages aren't mlocked then presumably some of the existing
pages can be flushed out to disk and replaced with other pages.

>
> What you are suggesting is to virtually group the inodes by container rather
> than task. It might make sense in some cases, but not all.

Right - I think it's an important feature to be able to support, but I
agree that it's not suitable for all situations.

>
> We could consider implementing the controllers in phases
>
> 1. RSS control (anon + mapped pages)
> 2. Page Cache control

Page cache control is actually more essential that RSS control, in our
experience - it's pretty easy to track RSS values from userspace, and
react reasonably quickly to kill things that go over their limit, but
determining page cache usage (i.e. determining which job on the system
is flooding the page cache with dirty buffers) is pretty much
impossible currently.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ