[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1159474984.4773.138.camel@linuxchandra>
Date: Thu, 28 Sep 2006 13:23:04 -0700
From: Chandra Seetharaman <sekharan@...ibm.com>
To: rohitseth@...gle.com
Cc: linux-kernel <linux-kernel@...r.kernel.org>, devel@...nvz.org,
CKRM-Tech <ckrm-tech@...ts.sourceforge.net>
Subject: Re: [ckrm-tech] [patch00/05]: Containers(V2)- Introduction
On Thu, 2006-09-28 at 11:12 -0700, Rohit Seth wrote:
> On Wed, 2006-09-27 at 15:24 -0700, Chandra Seetharaman wrote:
> > On Wed, 2006-09-27 at 14:28 -0700, Rohit Seth wrote:
> >
> > Rohit,
> >
> > For 1-4, I understand the rationale. But, your implementation deviates
> > from the current behavior of the VM subsystem which could affect the
> > ability of these patches getting into mainline.
> >
>
> I agree that this implementation differs from existing VM subsystem.
> But the key point here is, it puts the pages that should be reclaimed.
But, you are putting the pages up for reclamation without any
consideration to the working set and system memory pressure.
> And this part needs further refining.
>
> > IMO, the current behavior in terms of reclamation, LRU, vm_swappiness,
> > and writeback logic should be maintained.
> >
>
> How? I don't want to duplicate the whole logic for containers.
We don't have to be duplicating the whole logic. Just make sure that the
existing mechanisms are aware of containers, if they exist.
<snip>
> >
> > But, it will still suffer from (1) above, as we would have no idea of
> > the current working set (LRU) (within an item or among the items).
> >
>
> Please let me know how do you propose to have another LRU for pages in
> containers. Though I can add some heuristics.
There are multiple ways as Balbir pointed in his email:
- reclamation per container (as in current RG implementation)
( + do a system wide reclaim when the system pressure is high)
- reclaim with the knowledge of containers that are over limit
(Dave Hansen's patches + avoid overhead of combing the list)
- have two lists one for the system and one per container
>
> > >
> > > > 6. Both active and inactive pages use physical pages. But, the
> > > > controller only counts active pages and not inactive pages. why ?
> > >
> > > The thought is, it is okay for containers to go over its limit as long
> >
> > Real number of "physical pages" used by the container is the sum of
> > active and inactive pages.
> >
>
> >From the user pov, the real sum of pages that are used by container for
> user land is anon + file. Now some times it is possible that there are
> active pages that are neither in page cache nor in use as anon.
>
>
> > My question is, shouldn't that be used to check against page limit
> > instead of active pages alone ?
> I can use active+inactive as the test. Sure. But I will have to also
> still have a check to make sure that number of active pages themselves
> is not bigger than page_limit.
>
> > How do we describe "page limit" as (to the user) ?
> >
>
> Amount of memory below which no container throttling will happen. And
But, from the implementation one cannot clearly derive what we mean by
"memory" here (physical ?, file + anon ?; if we say physical, it is not
correct).
> if the system is properly configured that it also ensures that this much
> memory will always be there to user. If a container goes over this
> limit then it will be throttled and it will suffer performance.
But, the user's expectation would be that we would be throwing out pages
based on LRU (within that container). But this implementation doesn't
provide that behavior. It doesn't care about the working set.
Performance impact will be lesser if we consider the working set and
throw out pages based on LRU (within a container).
>
> > > as there is enough memory in the system. When there is any memory
> > > pressure then the inactive (+ dereferenced) pages get swapped out thus
> > > penalizing the container. I'm also thinking of having hard limit for
> >
> > Reclamation goes through active pages and page cache pages before it
> > gets into inactive pages. So, this may not work as you are explaining.
>
> That is a good point. I'll have to make a check in reclaim so that when
> the system is ready for swap or write back then containers are looked
> first.
>
> >
> > > anonymous pages beyond which the container will not be able to grow its
> > > anonymous pages.
> >
> > You might break the current behavior (memory pressure must be very high
> > before these starts failing) if you are going to be strict about it.
> >
>
> That feature when implemented will be a container specific.
My point is, even though it is container specific, the behavior (inside
a container) should be same as what a user sees at the system level now.
For example, consider a workload that is run on a 1G system now, and
user sees only occasional memory allocation failures and just a handful
of oom kills. When the workload is moved to a container with 1G,
failures the user see should be in the same order ( and similar with
performance characteristics).
Do you agree that it will be the user's expectation ?
>
> > >
> > > > 7. Page limit is checked against the sum of (anon and file pages) in
> > > > some places and against active pages at some other places. IMO, it
> > > > should be always compared to the same value.
> > > >
> > > It is checked against sum of anon+file pages at the time when new pages
> >
> > why can't we check against active pages here ?
> >
> > > is getting allocated. But as the reclaimer activate the pages, so it is
> > > also important to make sure the number of active pages is not going
> > > above its limit.
> >
> > My point is that they won't be same (ever) and hence the check is
> > inconsistent.
> >
>
> The check ensures
> 1- when a new page is getting added then the total sum of pages is
> checked against the limit.
> 2- Number of active pages don't exceed the limit.
>
> These two points combined together enforce the decision that once the
> container goes over the limit, we scan the pages again to deactivate the
> excess.
Again, I understand the rationale. But it is not consistent.
>
> Thanks,
> -rohit
>
--
----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- sekharan@...ibm.com | .......you may get it.
----------------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists