[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20081107181932.94e6f307.kamezawa.hiroyu@jp.fujitsu.com>
Date: Fri, 7 Nov 2008 18:19:32 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Daisuke Nishimura <nishimura@....nes.nec.co.jp>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
"menage@...gle.com" <menage@...gle.com>
Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller
On Fri, 7 Nov 2008 18:02:48 +0900
Daisuke Nishimura <nishimura@....nes.nec.co.jp> wrote:
> On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
> > Mem+Swap controller core.
> >
> > This patch implements per cgroup limit for usage of memory+swap.
> > However there are SwapCache, double counting of swap-cache and
> > swap-entry is avoided.
> >
> > Mem+Swap controller works as following.
> > - memory usage is limited by memory.limit_in_bytes.
> > - memory + swap usage is limited by memory.memsw_limit_in_bytes.
> >
> >
> > This has following benefits.
> > - A user can limit total resource usage of mem+swap.
> >
> > Without this, because memory resource controller doesn't take care of
> > usage of swap, a process can exhaust all the swap (by memory leak.)
> > We can avoid this case.
> >
> > And Swap is shared resource but it cannot be reclaimed (goes back to memory)
> > until it's used. This characteristic can be trouble when the memory
> > is divided into some parts by cpuset or memcg.
> > Assume group A and group B.
> > After some application executes, the system can be..
> >
> > Group A -- very large free memory space but occupy 99% of swap.
> > Group B -- under memory shortage but cannot use swap...it's nearly full.
> >
> > Ability to set appropriate swap limit for each group is required.
> >
> > Maybe someone wonder "why not swap but mem+swap ?"
> >
> > - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
> > to move account from memory to swap...there is no change in usage of
> > mem+swap.
> >
> > In other words, when we want to limit the usage of swap without affecting
> > global LRU, mem+swap limit is better than just limiting swap.
> >
> >
> > Accounting target information is stored in swap_cgroup which is
> > per swap entry record.
> >
> > Charge is done as following.
> > map
> > - charge page and memsw.
> >
> > unmap
> > - uncharge page/memsw if not SwapCache.
> >
> > swap-out (__delete_from_swap_cache)
> > - uncharge page
> > - record mem_cgroup information to swap_cgroup.
> >
> > swap-in (do_swap_page)
> > - charged as page and memsw.
> > record in swap_cgroup is cleared.
> > memsw accounting is decremented.
> >
> > swap-free (swap_free())
> > - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
> >
> >
> > After this, usual memory resource controller handles SwapCache.
> > (It was lacked(ignored) feature in current memcg but must be handled.)
> >
> SwapCache has been handled in [2/6] already :)
>
yes. I'll rewrite this.
> (snip)
> > @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc
> > css_get(&mem->css);
> > }
> >
> > + while (1) {
> > + int ret;
> > + bool noswap = false;
> >
> > - while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
> > + ret = res_counter_charge(&mem->res, PAGE_SIZE);
> > + if (likely(!ret)) {
> > + if (!do_swap_account)
> > + break;
> > + ret = res_counter_charge(&mem->memsw, PAGE_SIZE);
> > + if (likely(!ret))
> > + break;
> > + /* mem+swap counter fails */
> > + res_counter_uncharge(&mem->res, PAGE_SIZE);
> > + noswap = true;
> > + }
> > if (!(gfp_mask & __GFP_WAIT))
> > goto nomem;
> >
> > - if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
> > + if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap))
> > continue;
> >
> > /*
> I have two comment about try_charge.
>
> 1. It would be better if possible to avoid charging memsw at swapin (and uncharging
> it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin).
> How about adding a new argument "charge_memsw" ? (it has many args already now...)
Hmm, maybe possible and good. I'll cosider this again.
> 2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ?
>
I'd like to put that special case into "TODO" list. Hmm...
maybe set noswap=true in that case is enough. but we have to be careful.
> (snip)
> > void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
> > @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str
> > if (!mem)
> > return;
> > res_counter_uncharge(&mem->res, PAGE_SIZE);
> > + res_counter_uncharge(&mem->memsw, PAGE_SIZE);
> > css_put(&mem->css);
> > }
> >
> "if (do_swap_account)" is needed before uncharging memsw.
>
good catch !
> (snip)
> > static struct cftype mem_cgroup_files[] = {
> > {
> > .name = "usage_in_bytes",
> > - .private = RES_USAGE,
> > + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
> > .read_u64 = mem_cgroup_read,
> > },
> > {
> > .name = "max_usage_in_bytes",
> > - .private = RES_MAX_USAGE,
> > + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE),
> > .trigger = mem_cgroup_reset,
> > .read_u64 = mem_cgroup_read,
> > },
> > {
> > .name = "limit_in_bytes",
> > - .private = RES_LIMIT,
> > + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT),
> > .write_string = mem_cgroup_write,
> > .read_u64 = mem_cgroup_read,
> > },
> > {
> > .name = "failcnt",
> > - .private = RES_FAILCNT,
> > + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
> > .trigger = mem_cgroup_reset,
> > .read_u64 = mem_cgroup_read,
> > },
> > @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[]
> > .name = "stat",
> > .read_map = mem_control_stat_show,
> > },
> > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > + {
> > + .name = "memsw.usage_in_bytes",
> > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
> > + .read_u64 = mem_cgroup_read,
> > + },
> > + {
> > + .name = "memsw.max_usage_in_bytes",
> > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE),
> > + .trigger = mem_cgroup_reset,
> > + .read_u64 = mem_cgroup_read,
> > + },
> > + {
> > + .name = "memsw.limit_in_bytes",
> > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT),
> > + .write_string = mem_cgroup_write,
> > + .read_u64 = mem_cgroup_read,
> > + },
> > + {
> > + .name = "memsw.failcnt",
> > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT),
> > + .trigger = mem_cgroup_reset,
> > + .read_u64 = mem_cgroup_read,
> > + },
> > +#endif
> > };
> >
> IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[],
> and change mem_cgroup_populate() like:
>
> static int mem_cgroup_populate(struct cgroup_subsys *ss,
> struct cgroup *cont)
> {
> int ret;
>
> ret = cgroup_add_files(cont, ss, mem_cgroup_files,
> ARRAY_SIZE(mem_cgroup_files));
> if (!ret && do_swap_account)
> ret = cgroup_add_files(cont, ss, memsw_cgroup_files,
> ARRAY_SIZE(memsw_cgroup_files));
>
> return ret;
> }
>
> so that those files appear only when swap accounting is enabled.
>
Nice idea. I'll try that.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists