linux-kernel - Re: memory resource accounting (was Re: [RFC, PATCH 0/5] Going forward with Resource Management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <44D9403F.4070608@yahoo.com.au>
Date:	Wed, 09 Aug 2006 11:54:07 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Martin Bligh <mbligh@...igh.org>
CC:	rohitseth@...gle.com, Dave Hansen <haveblue@...ibm.com>,
	Kirill Korotaev <dev@...ru>, vatsa@...ibm.com,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Andrew Morton <akpm@...l.org>, mingo@...e.hu, sam@...ain.net,
	linux-kernel@...r.kernel.org, dev@...nvz.org, efault@....de,
	balbir@...ibm.com, sekharan@...ibm.com, nagar@...son.ibm.com,
	pj@....com, Andrey Savochkin <saw@...ru>
Subject: Re: memory resource accounting (was Re: [RFC, PATCH 0/5] Going forward
 with Resource Management - A	cpu controller)

Martin Bligh wrote:

>>> It also saves you from maintaining huge lists against each page.
>>>
>>> Worse case, you want to bill everyone who opens that address_space
>>> equally. But the semantics on exit still suck.
>>>
>>> What was Alan's quote again? "unfair, unreliable, inefficient ...
>>> pick at least one out of the three". or something like that.
>>
>>
>> What's the sucking semantics on exit? I haven't looked much at the
>> existing memory controllers going around, but the implementation I
>> imagine looks something like this (I think it is conceptually similar
>> to the basic beancounters idea):
>
>
> You have to increase the other processes allocations, putting them
> over their limits. If you then force them into reclaim, they're going
> to stall, and give bad latency.


Not within a particular container. If the process exits but leaves around
some memory charge, then that just remains within the same container.

If you want to remove a container, then you have a hierarchy of billing
and your charge just gets accounted to the parent.

>
>> - anyone who allocates a page for anything gets charged for that page.
>>   Except interrupt/softirq context. Could we ignore these for the 
>> moment?
>>
>>   This does give you kernel (slab, pagetable, etc) allocations as 
>> well as
>>   userspace. I don't like the idea of doing controllers for inode cache
>>   and controllers for dentry cache, etc, etc, ad infinitum.
>>
>> - each struct page has a backpointer to its billed container. At the mm
>>   summit Linus said he didn't want back pointers, but I clarified 
>> with him
>>   and he isn't against them if they are easily configured out when 
>> not   using memory controllers.
>>
>> - memory accounting containers are in a hierarchy. If you want to 
>> destroy a
>>   container but it still has billed memory outstanding, that gets 
>> charged
>>   back to the parent. The data structure itself obviously still needs to
>>   stay around, to keep the backpointers from going stale... but that 
>> could
>>   be as little as a word or two in size.
>>
>> The reason I like this way of accounting is that it can be done with 
>> a couple
>> of hooks into page_alloc.c and an ifdef in mm.h, and that is the 
>> extent of
>> the impact on core mm/ so I'd be against anything more intrusive 
>> unless this
>> really doesn't work.
>>
>
> See "inefficent" above (sorry ;-)) What you've chosen is more correct,
> but much higher overhead. The point was that there's tradeoffs either
> way - the conclusion we came to last time was that to make it 100%
> correct, you'd be better off going with a model like Xen.


So if someone says they want it 100% correct, I can tell them to use
Xen and not put accounting into any place in the kernel that allocates
memory? Sweet OK.

If we're happy with doing userspace only memory, then a similar scheme
can be implemented on an object-accounting basis (eg. vmas). I think
there is something that already implements this.

>
> 1. You're adding a backpointer to struct page.


That's nowhere near the overhead of pte chain rmaps, though. I think it
is perfectly acceptible (assuming you *did* want to account kernel page
allocations) and probably will be difficult to notice on non-crazy-highmem
boxes. Which is just about everyone we care about now.

>
> 2. Each page is not accounted to one container, but shared across them,
> so the billing changes every time someone forks or exits. And not just
> for that container, but all of them. Think pte chain based rmap ...
> except worse.


In my proposed scheme, it is just the first who allocates. You hope that
statistically, that is good enough. Otherwise you could go into tracking
what process has a reference to which dentry... good luck getting that
past Al and Christoph.

>
> 3. When a container needs to "shrink" when somebody else exits, how do
> we do reclaim pages from a specific container?


Not the problem of accounting. Any other scheme will have a similar
problem.

However, having the container in the struct page *could* actually help
directed reclaim FWIW.

--

Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/