linux-kernel - Re: [PATCH 5/9] ksm: share anon page without allocating

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20091130090448.71cf6138.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Mon, 30 Nov 2009 09:04:48 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Hugh Dickins <hugh.dickins@...cali.co.uk>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Izik Eidus <ieidus@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Chris Wright <chrisw@...hat.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 5/9] ksm: share anon page without allocating


Sorry for delayed response.

On Tue, 24 Nov 2009 16:48:46 +0000 (GMT)
Hugh Dickins <hugh.dickins@...cali.co.uk> wrote:

> When ksm pages were unswappable, it made no sense to include them in
> mem cgroup accounting; but now that they are swappable (although I see
> no strict logical connection)
I asked that for throwing away too complicated but wast of time things.
If not on LRU, its own limitation (ksm's page limit) works enough.

> the principle of least surprise implies
> that they should be accounted (with the usual dissatisfaction, that a
> shared page is accounted to only one of the cgroups using it).
> 
> This patch was intended to add mem cgroup accounting where necessary;
> but turned inside out, it now avoids allocating a ksm page, instead
> upgrading an anon page to ksm - which brings its existing mem cgroup
> accounting with it.  Thus mem cgroups don't appear in the patch at all.
> 
ok. then, what I should see is patch 6.

Thanks,
-Kame
> This upgrade from PageAnon to PageKsm takes place under page lock
> (via a somewhat hacky NULL kpage interface), and audit showed only
> one place which needed to cope with the race - page_referenced() is
> sometimes used without page lock, so page_lock_anon_vma() needs an
> ACCESS_ONCE() to be sure of getting anon_vma and flags together
> (no problem if the page goes ksm an instant after, the integrity
> of that anon_vma list is unaffected).
> 
> Signed-off-by: Hugh Dickins <hugh.dickins@...cali.co.uk>
> ---
> 
>  mm/ksm.c  |   67 ++++++++++++++++------------------------------------
>  mm/rmap.c |    6 +++-
>  2 files changed, 25 insertions(+), 48 deletions(-)
> 
> --- ksm4/mm/ksm.c	2009-11-22 20:40:18.000000000 +0000
> +++ ksm5/mm/ksm.c	2009-11-22 20:40:27.000000000 +0000
> @@ -831,7 +831,8 @@ out:
>   * try_to_merge_one_page - take two pages and merge them into one
>   * @vma: the vma that holds the pte pointing to page
>   * @page: the PageAnon page that we want to replace with kpage
> - * @kpage: the PageKsm page that we want to map instead of page
> + * @kpage: the PageKsm page that we want to map instead of page,
> + *         or NULL the first time when we want to use page as kpage.
>   *
>   * This function returns 0 if the pages were merged, -EFAULT otherwise.
>   */
> @@ -864,15 +865,24 @@ static int try_to_merge_one_page(struct
>  	 * ptes are necessarily already write-protected.  But in either
>  	 * case, we need to lock and check page_count is not raised.
>  	 */
> -	if (write_protect_page(vma, page, &orig_pte) == 0 &&
> -	    pages_identical(page, kpage))
> -		err = replace_page(vma, page, kpage, orig_pte);
> +	if (write_protect_page(vma, page, &orig_pte) == 0) {
> +		if (!kpage) {
> +			/*
> +			 * While we hold page lock, upgrade page from
> +			 * PageAnon+anon_vma to PageKsm+NULL stable_node:
> +			 * stable_tree_insert() will update stable_node.
> +			 */
> +			set_page_stable_node(page, NULL);
> +			mark_page_accessed(page);
> +			err = 0;
> +		} else if (pages_identical(page, kpage))
> +			err = replace_page(vma, page, kpage, orig_pte);
> +	}
>  
> -	if ((vma->vm_flags & VM_LOCKED) && !err) {
> +	if ((vma->vm_flags & VM_LOCKED) && kpage && !err) {
>  		munlock_vma_page(page);
>  		if (!PageMlocked(kpage)) {
>  			unlock_page(page);
> -			lru_add_drain();

Is this related to memcg ?

>  			lock_page(kpage);
>  			mlock_vma_page(kpage);
>  			page = kpage;		/* for final unlock */
> @@ -922,7 +932,7 @@ out:
>   * This function returns the kpage if we successfully merged two identical
>   * pages into one ksm page, NULL otherwise.
>   *
> - * Note that this function allocates a new kernel page: if one of the pages
> + * Note that this function upgrades page to ksm page: if one of the pages
>   * is already a ksm page, try_to_merge_with_ksm_page should be used.
>   */
>  static struct page *try_to_merge_two_pages(struct rmap_item *rmap_item,
> @@ -930,10 +940,7 @@ static struct page *try_to_merge_two_pag
>  					   struct rmap_item *tree_rmap_item,
>  					   struct page *tree_page)
>  {
> -	struct mm_struct *mm = rmap_item->mm;
> -	struct vm_area_struct *vma;
> -	struct page *kpage;
> -	int err = -EFAULT;
> +	int err;
>  
>  	/*
>  	 * The number of nodes in the stable tree
> @@ -943,37 +950,10 @@ static struct page *try_to_merge_two_pag
>  	    ksm_max_kernel_pages <= ksm_pages_shared)
>  		return NULL;
>  
> -	kpage = alloc_page(GFP_HIGHUSER);
> -	if (!kpage)
> -		return NULL;
> -
> -	down_read(&mm->mmap_sem);
> -	if (ksm_test_exit(mm))
> -		goto up;
> -	vma = find_vma(mm, rmap_item->address);
> -	if (!vma || vma->vm_start > rmap_item->address)
> -		goto up;
> -
> -	copy_user_highpage(kpage, page, rmap_item->address, vma);
> -
> -	SetPageDirty(kpage);
> -	__SetPageUptodate(kpage);
> -	SetPageSwapBacked(kpage);
> -	set_page_stable_node(kpage, NULL);	/* mark it PageKsm */
> -	lru_cache_add_lru(kpage, LRU_ACTIVE_ANON);
> -
> -	err = try_to_merge_one_page(vma, page, kpage);
> -	if (err)
> -		goto up;
> -
> -	/* Must get reference to anon_vma while still holding mmap_sem */
> -	hold_anon_vma(rmap_item, vma->anon_vma);
> -up:
> -	up_read(&mm->mmap_sem);
> -
> +	err = try_to_merge_with_ksm_page(rmap_item, page, NULL);
>  	if (!err) {
>  		err = try_to_merge_with_ksm_page(tree_rmap_item,
> -							tree_page, kpage);
> +							tree_page, page);
>  		/*
>  		 * If that fails, we have a ksm page with only one pte
>  		 * pointing to it: so break it.
> @@ -981,11 +961,7 @@ up:
>  		if (err)
>  			break_cow(rmap_item);
>  	}
> -	if (err) {
> -		put_page(kpage);
> -		kpage = NULL;
> -	}
> -	return kpage;
> +	return err ? NULL : page;
>  }
>  
>  /*
> @@ -1244,7 +1220,6 @@ static void cmp_and_merge_page(struct pa
>  				stable_tree_append(rmap_item, stable_node);
>  			}
>  			unlock_page(kpage);
> -			put_page(kpage);
>  
>  			/*
>  			 * If we fail to insert the page into the stable tree,
> --- ksm4/mm/rmap.c	2009-11-22 20:40:11.000000000 +0000
> +++ ksm5/mm/rmap.c	2009-11-22 20:40:27.000000000 +0000
> @@ -204,7 +204,7 @@ struct anon_vma *page_lock_anon_vma(stru
>  	unsigned long anon_mapping;
>  
>  	rcu_read_lock();
> -	anon_mapping = (unsigned long) page->mapping;
> +	anon_mapping = (unsigned long) ACCESS_ONCE(page->mapping);
>  	if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
>  		goto out;
>  	if (!page_mapped(page))
> @@ -666,7 +666,9 @@ static void __page_check_anon_rmap(struc
>   * @address:	the user virtual address mapped
>   *
>   * The caller needs to hold the pte lock, and the page must be locked in
> - * the anon_vma case: to serialize mapping,index checking after setting.
> + * the anon_vma case: to serialize mapping,index checking after setting,
> + * and to ensure that PageAnon is not being upgraded racily to PageKsm
> + * (but PageKsm is never downgraded to PageAnon).
>   */
>  void page_add_anon_rmap(struct page *page,
>  	struct vm_area_struct *vma, unsigned long address)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/