linux-kernel - Re: [RFC][PATCH 5/6] memcg: mem+swap controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp>
Date:	Fri, 7 Nov 2008 18:02:48 +0900
From:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	"menage@...gle.com" <menage@...gle.com>,
	nishimura@....nes.nec.co.jp
Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller

On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
> Mem+Swap controller core.
> 
> This patch implements per cgroup limit for usage of memory+swap.
> However there are SwapCache, double counting of swap-cache and
> swap-entry is avoided.
> 
> Mem+Swap controller works as following.
>   - memory usage is limited by memory.limit_in_bytes.
>   - memory + swap usage is limited by memory.memsw_limit_in_bytes.
> 
> 
> This has following benefits.
>   - A user can limit total resource usage of mem+swap.
> 
>     Without this, because memory resource controller doesn't take care of
>     usage of swap, a process can exhaust all the swap (by memory leak.)
>     We can avoid this case.
> 
>     And Swap is shared resource but it cannot be reclaimed (goes back to memory)
>     until it's used. This characteristic can be trouble when the memory
>     is divided into some parts by cpuset or memcg.
>     Assume group A and group B.
>     After some application executes, the system can be..
>     
>     Group A -- very large free memory space but occupy 99% of swap.
>     Group B -- under memory shortage but cannot use swap...it's nearly full.
> 
>     Ability to set appropriate swap limit for each group is required.
>       
> Maybe someone wonder "why not swap but mem+swap ?"
> 
>   - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
>     to move account from memory to swap...there is no change in usage of
>     mem+swap.
> 
>     In other words, when we want to limit the usage of swap without affecting
>     global LRU, mem+swap limit is better than just limiting swap.
> 
> 
> Accounting target information is stored in swap_cgroup which is
> per swap entry record.
> 
> Charge is done as following.
>   map
>     - charge  page and memsw.
> 
>   unmap
>     - uncharge page/memsw if not SwapCache.
> 
>   swap-out (__delete_from_swap_cache)
>     - uncharge page
>     - record mem_cgroup information to swap_cgroup.
> 
>   swap-in (do_swap_page)
>     - charged as page and memsw.
>       record in swap_cgroup is cleared.
>       memsw accounting is decremented.
> 
>   swap-free (swap_free())
>     - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
> 
> 
> After this, usual memory resource controller handles SwapCache.
> (It was lacked(ignored) feature in current memcg but must be handled.)
> 
SwapCache has been handled in [2/6] already :)

(snip)
> @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc
>  		css_get(&mem->css);
>  	}
>  
> +	while (1) {
> +		int ret;
> +		bool noswap = false;
>  
> -	while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) {
> +		ret = res_counter_charge(&mem->res, PAGE_SIZE);
> +		if (likely(!ret)) {
> +			if (!do_swap_account)
> +				break;
> +			ret = res_counter_charge(&mem->memsw, PAGE_SIZE);
> +			if (likely(!ret))
> +				break;
> +			/* mem+swap counter fails */
> +			res_counter_uncharge(&mem->res, PAGE_SIZE);
> +			noswap = true;
> +		}
>  		if (!(gfp_mask & __GFP_WAIT))
>  			goto nomem;
>  
> -		if (try_to_free_mem_cgroup_pages(mem, gfp_mask))
> +		if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap))
>  			continue;
>  
>  		/*
I have two comment about try_charge.

1. It would be better if possible to avoid charging memsw at swapin (and uncharging
   it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin).
   How about adding a new argument "charge_memsw" ? (it has many args already now...)
2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ?

(snip)
>  void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem)
> @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str
>  	if (!mem)
>  		return;
>  	res_counter_uncharge(&mem->res, PAGE_SIZE);
> +	res_counter_uncharge(&mem->memsw, PAGE_SIZE);
>  	css_put(&mem->css);
>  }
>  
"if (do_swap_account)" is needed before uncharging memsw.

(snip)
>  static struct cftype mem_cgroup_files[] = {
>  	{
>  		.name = "usage_in_bytes",
> -		.private = RES_USAGE,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_USAGE),
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "max_usage_in_bytes",
> -		.private = RES_MAX_USAGE,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE),
>  		.trigger = mem_cgroup_reset,
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "limit_in_bytes",
> -		.private = RES_LIMIT,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_LIMIT),
>  		.write_string = mem_cgroup_write,
>  		.read_u64 = mem_cgroup_read,
>  	},
>  	{
>  		.name = "failcnt",
> -		.private = RES_FAILCNT,
> +		.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT),
>  		.trigger = mem_cgroup_reset,
>  		.read_u64 = mem_cgroup_read,
>  	},
> @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[] 
>  		.name = "stat",
>  		.read_map = mem_control_stat_show,
>  	},
> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> +	{
> +		.name = "memsw.usage_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE),
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.max_usage_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE),
> +		.trigger = mem_cgroup_reset,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.limit_in_bytes",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT),
> +		.write_string = mem_cgroup_write,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +	{
> +		.name = "memsw.failcnt",
> +		.private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT),
> +		.trigger = mem_cgroup_reset,
> +		.read_u64 = mem_cgroup_read,
> +	},
> +#endif
>  };
>  
IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[],
and change mem_cgroup_populate() like:

static int mem_cgroup_populate(struct cgroup_subsys *ss,
				struct cgroup *cont)
{
	int ret;

	ret = cgroup_add_files(cont, ss, mem_cgroup_files,
					ARRAY_SIZE(mem_cgroup_files));
	if (!ret && do_swap_account)
		ret = cgroup_add_files(cont, ss, memsw_cgroup_files,
					ARRAY_SIZE(memsw_cgroup_files));

	return ret;
}

so that those files appear only when swap accounting is enabled.


Thanks,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/