linux-kernel - Re: [RFC][0/3] Virtual address space control for cgroups (v2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6599ad830803270728y354b567s7bfe8cb7472aa065@mail.gmail.com>
Date:	Thu, 27 Mar 2008 07:28:18 -0700
From:	"Paul Menage" <menage@...gle.com>
To:	balbir@...ux.vnet.ibm.com
Cc:	"Andrew Morton" <akpm@...ux-foundation.org>,
	"Pavel Emelianov" <xemul@...nvz.org>,
	"Hugh Dickins" <hugh@...itas.com>,
	"Sudhir Kumar" <skumar@...ux.vnet.ibm.com>,
	"YAMAMOTO Takashi" <yamamoto@...inux.co.jp>, lizf@...fujitsu.com,
	linux-kernel@...r.kernel.org, taka@...inux.co.jp,
	linux-mm@...ck.org, "David Rientjes" <rientjes@...gle.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [RFC][0/3] Virtual address space control for cgroups (v2)

On Thu, Mar 27, 2008 at 1:04 AM, Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
>
>  I thought I addressed some of those by adding a separate config option. You
>  could enable just the address space control, by letting memory.limit_in_bytes at
>  the maximum value it is at (at the moment).
>

Having a config option is better than none at all, certainly for
people who roll their own kernels. But what config choice should a
distro make when they're deciding what to build into their kernel
configuration?

It's much easier to decide to build in a feature that can be ignored
by those who don't use it.

>
>  Yes, I agree with the overhead philosophy. I suspect that users will enable
>  both. I am not against making it a separate controller. I am still hopeful of
>  getting the mm->owner approach working
>

I was thinking more about that, and I think I found a possibly fatal flaw:


>
>  >
>  > Trying to account/control physical memory or swap usage via virtual
>  > address space limits is IMO a hopeless task. Taking Google's
>  > production clusters and the virtual server systems that I worked on in
>  > my previous job as real-life examples that I've encountered, there's
>  > far too much variety of application behaviour (including Java apps
>  > that have massive sparse heaps, jobs with lots of forked children
>  > sharing pages but not address spaces with their parents, and multiple
>  > serving processes mapping large shared data repositories from SHM
>  > segments) that saying VA = RAM + swap is going to break lots of jobs.
>  > But pushing up the VA limit massively makes it useless for the purpose
>  > of preventing excessive swapping. If you want to prevent excessive
>  > swap space usage without breaking a large class of apps, you need to
>  > limit swap space, not virtual address space.
>  >
>  > Additionally, you suggested that VA limits provide a "soft-landing".
>  > But I'm think that the number of applications that will do much other
>  > than abort() if mmap() returns ENOMEM is extremely small - I'd be
>  > interested to hear if you know of any.
>  >
>
>  What happens if swap is completely disabled? Should the task running be OOM
>  killed in the container?

Yes, I think so.

>  How does the application get to know that it is
>  reaching its limit?

That's something that needs to be addressed outside of the concept of
cgroups too.

> I suspect the system administrator will consider
>  vm.overcommit_ratio while setting up virtual address space limits and real page
>  usage limit. As far as applications failing gracefully is concerned, my opinion is
>
>  1. Lets not be dictated by bad applications to design our features
>  2. Autonomic computing is forcing applications to see what resources
>  applications do have access to

Yes, you're right - I shouldn't be arguing this based on current apps,
I should be thinking of the potential for future apps.

>  3. Swapping is expensive, so most application developers, I spoken to at
>  conferences, recently, state that they can manage their own memory, provided
>  they are given sufficient hints from the OS. An mmap() failure, for example can
>  force the application to free memory it is not currently using or trigger the
>  garbage collector in a managed environment.

But the problem that I have with this is that mmap() is only very
loosely connected with physical memory. If we're trying to help
applications avoid swapping, and giving them advance warning that
they're running out of physical memory, then we should do exactly
that, not try to treat address space as a proxy for physical memory.
For apps where there's a close correspondence between virtual address
space and physical memory, this should work equally well. For apps
that use a lot more virtual address space than physical memory this
should work much better.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/