lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 13 Mar 2012 14:18:23 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Hugh Dickins <hughd@...gle.com>
Cc:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Hillf Danton <dhillf@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] memcg: fix behavior of shard anon pages at task_move
 (Was Re: [PATCH v3 2/2] memcg: avoid THP split in task migration

On Fri, 9 Mar 2012 13:05:05 -0800 (PST)
Hugh Dickins <hughd@...gle.com> wrote:

> On Fri, 9 Mar 2012, KAMEZAWA Hiroyuki wrote:
> > From 1012e97e3b123fb80d0ec6b1f5d3dbc87a5a5139 Mon Sep 17 00:00:00 2001
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>

> > Suggested-by: Hugh Dickins <hughd@...gle.com>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> 
> Yes, this is much like what I had in mind (well, on disk):
> I'm glad we are in agreement now.
> 
> But first priority, I think, should be to revert to the silly "> 2"
> test for 3.3 final, so it behaves just like 2.6.35 through 3.2:
> I'll send Andrew and Linus a patch for that shortly.
> 
> > ---
> >  Documentation/cgroups/memory.txt |   10 ++++++++--
> >  include/linux/swap.h             |    9 ---------
> >  mm/memcontrol.c                  |   15 +++------------
> >  mm/swapfile.c                    |   31 -------------------------------
> >  4 files changed, 11 insertions(+), 54 deletions(-)
> > 
> > diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> > index 4c95c00..16bc9f2 100644
> > --- a/Documentation/cgroups/memory.txt
> > +++ b/Documentation/cgroups/memory.txt
> > @@ -185,6 +185,9 @@ behind this approach is that a cgroup that aggressively uses a shared
> >  page will eventually get charged for it (once it is uncharged from
> >  the cgroup that brought it in -- this will happen on memory pressure).
> >  
> > +(*)See section 8.2. At task moving, you can recharge mapped pages to other
> > +   cgroup.
> > +
> 
> Perhaps:
> 
> But see section 8.2: when moving a task to another cgroup, its pages may
> be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
> 
> (I've intentionally omitted the word "mapped" there because it can move
> even unmapped file pages and swapped-out anon pages.)
> 

Sure.


> >  Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
> >  When you do swapoff and make swapped-out pages of shmem(tmpfs) to
> >  be backed into memory in force, charges for pages are accounted against the
> > @@ -623,8 +626,8 @@ memory cgroup.
> >    bit | what type of charges would be moved ?
> >   -----+------------------------------------------------------------------------
> >     0  | A charge of an anonymous page(or swap of it) used by the target task.
> > -      | Those pages and swaps must be used only by the target task. You must
> > -      | enable Swap Extension(see 2.4) to enable move of swap charges.
> > +      | Even if it's shared, it will be moved in force(*). You must enable Swap
> 
> What is this "force"?  I solved the Documentation problem by simply
> removing one sentence, but you have tried harder.
> 

Ok, I remove the force (Sorry, I was not enough logical.)



> > +      | Extension(see 2.4) to enable move of swap charges.
> >   -----+------------------------------------------------------------------------
> >     1  | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory)
> >        | and swaps of tmpfs file) mmapped by the target task. Unlike the case of
> > @@ -635,6 +638,9 @@ memory cgroup.
> >        | page_mapcount(page) > 1). You must enable Swap Extension(see 2.4) to
> >        | enable move of swap charges.
> >  
> > +(*) Because of this, fork() -> move -> exec() will move all parent's page
> > +    to the target cgroup. Please be careful.
> > +
> 
> It only moves those pages which were in the old cgroup.
> 


You're right.

I'll consinder simpler one.



> >  8.3 TODO
> >  
> >  - Implement madvise(2) to let users decide the vma to be moved or not to be
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index f7df3ea..13c8d6f 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -390,7 +390,6 @@ static inline void deactivate_swap_token(struct mm_struct *mm, bool swap_token)
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
> >  extern void
> >  mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout);
> > -extern int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep);
> >  #else
> >  static inline void
> >  mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
> > @@ -535,14 +534,6 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
> >  {
> >  }
> >  
> > -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > -static inline int
> > -mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep)
> > -{
> > -	return 0;
> > -}
> > -#endif
> > -
> >  #endif /* CONFIG_SWAP */
> >  #endif /* __KERNEL__*/
> >  #endif /* _LINUX_SWAP_H */
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c83aeb5..e7e4e3d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -5100,12 +5100,9 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
> >  
> >  	if (!page || !page_mapped(page))
> >  		return NULL;
> > -	if (PageAnon(page)) {
> > -		/* we don't move shared anon */
> > -		if (!move_anon() || page_mapcount(page) > 2)
> > -			return NULL;
> > +	if (PageAnon(page) && !move_anon()) {
> > +		return NULL;
> >  	} else if (!move_file())
> > -		/* we ignore mapcount for file pages */
> >  		return NULL;
> 
> Doesn't that need to be
> 
> 	if (PageAnon(page)) {
> 		if (!move_anon())
> 			return NULL;
> 	} else if (!move_file())
> 		return NULL;
> 
> I think what you've written there makes treatment of anon pages
> in the move_anon() case dependent on move_file(), doesn't it?
> 

will rewrite.



> >  	if (!get_page_unless_zero(page))
> >  		return NULL;
> > @@ -5116,18 +5113,12 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
> >  static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> >  			unsigned long addr, pte_t ptent, swp_entry_t *entry)
> >  {
> > -	int usage_count;
> >  	struct page *page = NULL;
> >  	swp_entry_t ent = pte_to_swp_entry(ptent);
> >  
> >  	if (!move_anon() || non_swap_entry(ent))
> >  		return NULL;
> > -	usage_count = mem_cgroup_count_swap_user(ent, &page);
> > -	if (usage_count > 1) { /* we don't move shared anon */
> > -		if (page)
> > -			put_page(page);
> > -		return NULL;
> > -	}
> > +	page = lookup_swap_cache(ent);
> 
> It's annoying, but there's a tiny reason why that line needs to be
> 
> #ifdef CONFIG_SWAP
> 	page = find_get_page(&swapper_space, ent.val);
> #endif
> 
> because lookup_swap_cache() updates some antiquated swap lookup stats
> (which I do sometimes look at), which a straight find_get_page() avoids,
> rightly in this case; but swapper_space only defined when CONFIG_SWAP=y.
> 
> I was about to propose that we just add an arg to lookup_swap_cache(),
> for updating those stats or not; but see that mincore.c would like a
> little more than that to avoid its CONFIG_SWAPs, not sure quite what.
> 
> So for now, please just use the #ifdef CONFIG_SWAP find_get_page()
> that we use elsewhere, and I'll try to remove those warts later on.
> 

Okay, I will do that.


> >  	if (do_swap_account)
> >  		entry->val = ent.val;
> >  
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index fa3c519..85b4548 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -720,37 +720,6 @@ int free_swap_and_cache(swp_entry_t entry)
> >  	return p != NULL;
> >  }
> >  
> > -#ifdef CONFIG_CGROUP_MEM_RES_CTLR
> > -/**
> > - * mem_cgroup_count_swap_user - count the user of a swap entry
> > - * @ent: the swap entry to be checked
> > - * @pagep: the pointer for the swap cache page of the entry to be stored
> > - *
> > - * Returns the number of the user of the swap entry. The number is valid only
> > - * for swaps of anonymous pages.
> > - * If the entry is found on swap cache, the page is stored to pagep with
> > - * refcount of it being incremented.
> > - */
> > -int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep)
> > -{
> > -	struct page *page;
> > -	struct swap_info_struct *p;
> > -	int count = 0;
> > -
> > -	page = find_get_page(&swapper_space, ent.val);
> > -	if (page)
> > -		count += page_mapcount(page);
> > -	p = swap_info_get(ent);
> > -	if (p) {
> > -		count += swap_count(p->swap_map[swp_offset(ent)]);
> > -		spin_unlock(&swap_lock);
> > -	}
> > -
> > -	*pagep = page;
> > -	return count;
> > -}
> > -#endif
> > -
> 
> Exactly, delete delete delete!
> 
 :)

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ