linux-kernel - Re: [PATCH] mm/readahead.c: update the LRU positions of in-core pages, too

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Feb 2010 10:17:03 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Chris Frost <frost@...ucla.edu>
Cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steve Dickson <steved@...hat.com>,
	David Howells <dhowells@...hat.com>,
	Xu Chenfeng <xcf@...c.edu.cn>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Steve VanDeBogart <vandebo-lkml@...dbox.net>,
	Nick Piggin <npiggin@...e.de>
Subject: Re: [PATCH] mm/readahead.c: update the LRU positions of in-core
	pages, too

On Sun, Jan 31, 2010 at 07:06:39PM -0700, Chris Frost wrote:
> On Sun, Jan 31, 2010 at 10:31:42PM +0800, Wu Fengguang wrote:
> > On Tue, Jan 26, 2010 at 09:32:17PM +0800, Wu Fengguang wrote:
> > > On Mon, Jan 25, 2010 at 03:36:35PM -0700, Chris Frost wrote:
> > > > I changed Wu's patch to add a PageLRU() guard that I believe is required
> > > > and optimized zone lock acquisition to only unlock and lock at zone changes.
> > > > This optimization seems to provide a 10-20% system time improvement for
> > > > some of my GIMP benchmarks and no improvement for other benchmarks.
> > 
> > I feel very uncomfortable about this put_page() inside zone->lru_lock. 
> > (might deadlock: put_page() conditionally takes zone->lru_lock again)
> > 
> > If you really want the optimization, can we do it like this?
> 
> Sorry that I was slow to respond. (I was out of town.)
> 
> Thanks for catching __page_cache_release() locking the zone.
> I think staying simple for now sounds good. The below locks
> and unlocks the zone for each page. Look good?

OK :)

Thanks,
Fengguang

> ---
> readahead: retain inactive lru pages to be accessed soon
> From: Chris Frost <frost@...ucla.edu>
> 
> Ensure that cached pages in the inactive list are not prematurely evicted;
> move such pages to lru head when they are covered by
> - in-kernel heuristic readahead
> - an posix_fadvise(POSIX_FADV_WILLNEED) hint from an application
> 
> Before this patch, pages already in core may be evicted before the
> pages covered by the same prefetch scan but that were not yet in core.
> Many small read requests may be forced on the disk because of this
> behavior.
> 
> In particular, posix_fadvise(... POSIX_FADV_WILLNEED) on an in-core page
> has no effect on the page's location in the LRU list, even if it is the
> next victim on the inactive list.
> 
> This change helps address the performance problems we encountered
> while modifying SQLite and the GIMP to use large file prefetching.
> Overall these prefetching techniques improved the runtime of large
> benchmarks by 10-17x for these applications. More in the publication
> _Reducing Seek Overhead with Application-Directed Prefetching_ in
> USENIX ATC 2009 and at http://libprefetch.cs.ucla.edu/.
> 
> Signed-off-by: Chris Frost <frost@...ucla.edu>
> Signed-off-by: Steve VanDeBogart <vandebo@...ucla.edu>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
> ---
>  readahead.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index aa1aa23..c615f96 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -9,7 +9,9 @@
>  
>  #include <linux/kernel.h>
>  #include <linux/fs.h>
> +#include <linux/memcontrol.h>
>  #include <linux/mm.h>
> +#include <linux/mm_inline.h>
>  #include <linux/module.h>
>  #include <linux/blkdev.h>
>  #include <linux/backing-dev.h>
> @@ -133,6 +135,40 @@ out:
>  }
>  
>  /*
> + * The file range is expected to be accessed in near future.  Move pages
> + * (possibly in inactive lru tail) to lru head, so that they are retained
> + * in memory for some reasonable time.
> + */
> +static void retain_inactive_pages(struct address_space *mapping,
> +				  pgoff_t index, int len)
> +{
> +	int i;
> +
> +	for (i = 0; i < len; i++) {
> +		struct page *page;
> +		struct zone *zone;
> +
> +		page = find_get_page(mapping, index + i);
> +		if (!page)
> +			continue;
> +		zone = page_zone(page);
> +		spin_lock_irq(&zone->lru_lock);
> +
> +		if (PageLRU(page) &&
> +			!PageActive(page) &&
> +			!PageUnevictable(page)) {
> +			int lru = page_lru_base_type(page);
> +
> +			del_page_from_lru_list(zone, page, lru);
> +			add_page_to_lru_list(zone, page, lru);
> +		}
> +
> +		spin_unlock_irq(&zone->lru_lock);
> +		put_page(page);
> +	}
> +}
> +
> +/*
>   * __do_page_cache_readahead() actually reads a chunk of disk.  It allocates all
>   * the pages first, then submits them all for I/O. This avoids the very bad
>   * behaviour which would occur if page allocations are causing VM writeback.
> @@ -184,6 +220,14 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
>  	}
>  
>  	/*
> +	 * Normally readahead will auto stop on cached segments, so we won't
> +	 * hit many cached pages. If it does happen, bring the inactive pages
> +	 * adjecent to the newly prefetched ones(if any).
> +	 */
> +	if (ret < nr_to_read)
> +		retain_inactive_pages(mapping, offset, page_idx);
> +
> +	/*
>  	 * Now start the IO.  We ignore I/O errors - if the page is not
>  	 * uptodate then the caller will launch readpage again, and
>  	 * will then handle the error.
> 
> -- 
> Chris Frost
> http://www.frostnet.net/chris/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/