lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110111112949.57fd6fd7.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Tue, 11 Jan 2011 11:29:49 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	akpm@...ux-foundation.org, minchan.kim@...il.com,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>
Subject: Re: [PATCH] mm: add replace_page_cache_page() function

On Fri, 07 Jan 2011 19:22:41 +0100
Miklos Szeredi <miklos@...redi.hu> wrote:

> Here's an updated patch, addressing the review comments.
> 
> Hiroyuki-san, can you please review the newly introduced
> mem_cgroup_replace_cache_page(), as I'm not fully familiar with the
> memory cgroup code.
> 

Ok. Ccing Nishimura. see below.

> Thanks,
> Miklos
> ---
> 
> From: Miklos Szeredi <mszeredi@...e.cz>
> Subject: mm: add replace_page_cache_page() function
> 
> This function basically does:
> 
>      remove_from_page_cache(old);
>      page_cache_release(old);
>      add_to_page_cache_locked(new);
> 
> Except it does this atomically, so there's no possibility for the
> "add" to fail because of a race.
> 
> This is used by fuse to move pages into the page cache.
> 
> Signed-off-by: Miklos Szeredi <mszeredi@...e.cz>
> ---
>  fs/fuse/dev.c              |   10 +++------
>  include/linux/memcontrol.h |    8 +++++++
>  include/linux/pagemap.h    |    1 
>  mm/filemap.c               |   50 +++++++++++++++++++++++++++++++++++++++++++++
>  mm/memcontrol.c            |   38 ++++++++++++++++++++++++++++++++++
>  5 files changed, 101 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c	2011-01-07 17:53:39.000000000 +0100
> +++ linux-2.6/mm/filemap.c	2011-01-07 19:14:45.000000000 +0100
> @@ -390,6 +390,56 @@ int filemap_write_and_wait_range(struct
>  EXPORT_SYMBOL(filemap_write_and_wait_range);
>  
>  /**
> + * replace_page_cache_page - replace a pagecache page with a new one
> + * @old:	page to be replaced
> + * @new:	page to replace with
> + * @gfp_mask:	page allocation mode
> + *
> + * This function replaces a page in the pagecache with a new one.  On
> + * success it acquires the pagecache reference for the new page and
> + * drop it for the old page.  Both the old and new pages must be
> + * locked.  This function does not add the new page to the LRU, the
> + * caller must do that.
> + *
> + * The remove + add is atomic.  The only way this function can fail is
> + * memory allocation failure.
> + */
> +int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> +{
> +	int error;
> +
> +	VM_BUG_ON(!PageLocked(old));
> +	VM_BUG_ON(!PageLocked(new));
> +	VM_BUG_ON(new->mapping);
> +
> +	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> +	if (!error) {
> +		struct address_space *mapping = old->mapping;
> +		pgoff_t offset = old->index;
> +
> +		page_cache_get(new);
> +		new->mapping = mapping;
> +		new->index = offset;
> +
> +		spin_lock_irq(&mapping->tree_lock);
> +		__remove_from_page_cache(old);
> +		error = radix_tree_insert(&mapping->page_tree, offset, new);
> +		BUG_ON(error);
> +		mapping->nrpages++;
> +		__inc_zone_page_state(new, NR_FILE_PAGES);
> +		if (PageSwapBacked(new))
> +			__inc_zone_page_state(new, NR_SHMEM);
> +		spin_unlock_irq(&mapping->tree_lock);
> +		radix_tree_preload_end();
> +		mem_cgroup_replace_cache_page(old, new);
> +		page_cache_release(old);
> +	}
> +
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(replace_page_cache_page);
> +
> +/**
>   * add_to_page_cache_locked - add a locked page to the pagecache
>   * @page:	page to add
>   * @mapping:	the page's address_space
> Index: linux-2.6/include/linux/pagemap.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pagemap.h	2011-01-07 17:53:39.000000000 +0100
> +++ linux-2.6/include/linux/pagemap.h	2011-01-07 19:14:45.000000000 +0100
> @@ -457,6 +457,7 @@ int add_to_page_cache_lru(struct page *p
>  				pgoff_t index, gfp_t gfp_mask);
>  extern void remove_from_page_cache(struct page *page);
>  extern void __remove_from_page_cache(struct page *page);
> +int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask);
>  
>  /*
>   * Like add_to_page_cache_locked, but used to add newly allocated pages:
> Index: linux-2.6/fs/fuse/dev.c
> ===================================================================
> --- linux-2.6.orig/fs/fuse/dev.c	2011-01-07 17:53:39.000000000 +0100
> +++ linux-2.6/fs/fuse/dev.c	2011-01-07 19:14:45.000000000 +0100
> @@ -737,14 +737,12 @@ static int fuse_try_move_page(struct fus
>  	if (WARN_ON(PageMlocked(oldpage)))
>  		goto out_fallback_unlock;
>  
> -	remove_from_page_cache(oldpage);
> -	page_cache_release(oldpage);
> -
> -	err = add_to_page_cache_locked(newpage, mapping, index, GFP_KERNEL);
> +	err = replace_page_cache_page(oldpage, newpage, GFP_KERNEL);
>  	if (err) {
> -		printk(KERN_WARNING "fuse_try_move_page: failed to add page");
> -		goto out_fallback_unlock;
> +		unlock_page(newpage);
> +		return err;
>  	}
> +
>  	page_cache_get(newpage);
>  
>  	if (!(buf->flags & PIPE_BUF_FLAG_LRU))
> Index: linux-2.6/include/linux/memcontrol.h
> ===================================================================
> --- linux-2.6.orig/include/linux/memcontrol.h	2011-01-07 17:53:39.000000000 +0100
> +++ linux-2.6/include/linux/memcontrol.h	2011-01-07 19:14:45.000000000 +0100
> @@ -95,6 +95,9 @@ mem_cgroup_prepare_migration(struct page
>  extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
>  	struct page *oldpage, struct page *newpage);
>  
> +extern void mem_cgroup_replace_cache_page(struct page *oldpage,
> +					  struct page *newpage);
> +
>  /*
>   * For memory reclaim.
>   */
> @@ -236,6 +239,11 @@ static inline void mem_cgroup_end_migrat
>  {
>  }
>  
> +static inline void mem_cgroup_replace_cache_page(struct page *oldpage,
> +					 	struct page *newpage)
> +{
> +}
> +
>  static inline int mem_cgroup_get_reclaim_priority(struct mem_cgroup *mem)
>  {
>  	return 0;
> Index: linux-2.6/mm/memcontrol.c
> ===================================================================
> --- linux-2.6.orig/mm/memcontrol.c	2011-01-07 17:53:39.000000000 +0100
> +++ linux-2.6/mm/memcontrol.c	2011-01-07 19:20:41.000000000 +0100
> @@ -2905,6 +2905,44 @@ void mem_cgroup_end_migration(struct mem
>  }
>  
>  /*
> + * This function moves the charge from oldpage to newpage.  The new
> + * page must not be already charged.
> + */
> +void mem_cgroup_replace_cache_page(struct page *oldpage, struct page *newpage)
> +{
> +	struct page_cgroup *old_pc;
> +	struct page_cgroup *new_pc;
> +	struct mem_cgroup *mem;
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	old_pc = lookup_page_cgroup(oldpage);
> +	lock_page_cgroup(old_pc);
> +	if (!PageCgroupUsed(old_pc)) {
> +		unlock_page_cgroup(old_pc);
> +		return;
> +	}
> +
> +	mem = old_pc->mem_cgroup;
> +	css_get(&mem->css);
> +	ClearPageCgroupUsed(old_pc);
> +	unlock_page_cgroup(old_pc);
> +
> +	new_pc = lookup_page_cgroup(newpage);
> +	lock_page_cgroup(new_pc);
> +	BUG_ON(PageCgroupUsed(new_pc));
> +
> +	new_pc->mem_cgroup = mem;
> +	smp_wmb();
> +	SetPageCgroupCache(new_pc);
> +	SetPageCgroupUsed(new_pc);
> +	unlock_page_cgroup(new_pc);
> +	css_put(&mem->css);
> +}

I think some of moving flags are lacked and this new function is not necessary.


What I recommend is below. (Please see the newest -mm because of a bug fix for
mem cgroup) Considering page management on radix-tree, it can be considerd as
a kind of page-migration, which replaces pages on radix-tree.

==

> +int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> +{
> +	int error;
> +
> +	VM_BUG_ON(!PageLocked(old));
> +	VM_BUG_ON(!PageLocked(new));
> +	VM_BUG_ON(new->mapping);
> +
	struct mem_cgroup *memcg;

	error = mem_cgroup_prepare_migration(old, new, &memcg);
	#
	# This function will charge against "newpage". But this expects
	# the caller allows GFP_KERNEL gfp_mask. 
	# After this, the newpage is in "charged" state.
	if (error)
		return -ENOMEM;

> +	error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
> +	if (!error) {
> +		struct address_space *mapping = old->mapping;
> +		pgoff_t offset = old->index;
> +
> +		page_cache_get(new);
> +		new->mapping = mapping;
> +		new->index = offset;
> +
> +		spin_lock_irq(&mapping->tree_lock);
> +		__remove_from_page_cache(old);
> +		error = radix_tree_insert(&mapping->page_tree, offset, new);
> +		BUG_ON(error);
> +		mapping->nrpages++;
> +		__inc_zone_page_state(new, NR_FILE_PAGES);
> +		if (PageSwapBacked(new))
> +			__inc_zone_page_state(new, NR_SHMEM);
> +		spin_unlock_irq(&mapping->tree_lock);
> +		radix_tree_preload_end();

> +		mem_cgroup_replace_cache_page(old, new); <== remove this.

		mem_cgroup_end_migraton(memcg, old, new, true);

> +		page_cache_release(old);
> +	} 
	else 
		mem_cgroup_end_migration(memcg, old, new, false);

	# Here, if the 4th argument is true, old page is uncharged.
	# if the 4th argument is false, the new page is uncharged.
	# Then, "charge" of the old page will be migrated onto the new page
	# if replacement is done.



> +
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(replace_page_cache_page);
> +

==

I think this is enough simple and this covers all memory cgroup's racy
problems.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ