linux-kernel - Re: [PATCH 4/4] capture pages freed during direct reclaim for allocation by the reclaimer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1220512818.8609.174.camel@twins>
Date:	Thu, 04 Sep 2008 09:20:18 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Andy Whitcroft <apw@...dowen.org>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mel@....ul.ie>,
	Christoph Lameter <cl@...ux-foundation.org>
Subject: Re: [PATCH 4/4] capture pages freed during direct reclaim for
	allocation by the reclaimer

On Wed, 2008-09-03 at 21:53 +0100, Andy Whitcroft wrote:
> [Doh, as pointed out by Christoph the patch was missing from this one...]
> 
> When a process enters direct reclaim it will expend effort identifying
> and releasing pages in the hope of obtaining a page.  However as these
> pages are released asynchronously there is every possibility that the
> pages will have been consumed by other allocators before the reclaimer
> gets a look in.  This is particularly problematic where the reclaimer is
> attempting to allocate a higher order page.  It is highly likely that
> a parallel allocation will consume lower order constituent pages as we
> release them preventing them coelescing into the higher order page the
> reclaimer desires.
> 
> This patch set attempts to address this for allocations above
> ALLOC_COSTLY_ORDER by temporarily collecting the pages we are releasing
> onto a local free list.  Instead of freeing them to the main buddy lists,
> pages are collected and coelesced on this per direct reclaimer free list.
> Pages which are freed by other processes are also considered, where they
> coelesce with a page already under capture they will be moved to the
> capture list.  When pressure has been applied to a zone we then consult
> the capture list and if there is an appropriatly sized page available
> it is taken immediatly and the remainder returned to the free pool.
> Capture is only enabled when the reclaimer's allocation order exceeds
> ALLOC_COSTLY_ORDER as free pages below this order should naturally occur
> in large numbers following regular reclaim.
> 
> Thanks go to Mel Gorman for numerous discussions during the development
> of this patch and for his repeated reviews.

Whole series looks good, a few comments below.

Acked-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>

> Signed-off-by: Andy Whitcroft <apw@...dowen.org>
> ---

> @@ -4815,6 +4900,73 @@ out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  
> +#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
> +
> +/*
> + * Run through the accumulated list of captured pages and the first
> + * which is big enough to satisfy the original allocation.  Free
> + * the remainder of that page and all other pages.
> + */

That sentence looks incomplete, did you intend to write something along
the lines of:

Run through the accumulated list of captures pages and /take/ the first
which is big enough to satisfy the original allocation. Free the
remaining pages.

?

> +struct page *capture_alloc_or_return(struct zone *zone,
> +		struct zone *preferred_zone, struct list_head *capture_list,
> +		int order, int alloc_flags, gfp_t gfp_mask)
> +{
> +	struct page *capture_page = 0;
> +	unsigned long flags;
> +	int classzone_idx = zone_idx(preferred_zone);
> +
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	while (!list_empty(capture_list)) {
> +		struct page *page;
> +		int pg_order;
> +
> +		page = lru_to_page(capture_list);
> +		list_del(&page->lru);
> +		pg_order = page_order(page);
> +
> +		/*
> +		 * Clear out our buddy size and list information before
> +		 * releasing or allocating the page.
> +		 */
> +		rmv_page_order(page);
> +		page->buddy_free = 0;
> +		ClearPageBuddyCapture(page);
> +
> +		if (!capture_page && pg_order >= order) {
> +			__carve_off(page, pg_order, order);
> +			capture_page = page;
> +		} else
> +			__free_one_page(page, zone, pg_order);
> +	}
> +
> +	/*
> +	 * Ensure that this capture would not violate the watermarks.
> +	 * Subtle, we actually already have the page outside the watermarks
> +	 * so check if we can allocate an order 0 page.
> +	 */
> +	if (capture_page &&
> +	    (!zone_cpuset_permits(zone, alloc_flags, gfp_mask) ||
> +	     !zone_watermark_permits(zone, 0, classzone_idx,
> +					     alloc_flags, gfp_mask))) {
> +		__free_one_page(capture_page, zone, order);
> +		capture_page = NULL;
> +	}

This makes me a little sad - we got a high order page and give it away
again...

Can we start another round of direct reclaim with a lower order to try
and increase the watermarks while we hold on to this large order page?

> +	if (capture_page)
> +		__count_zone_vm_events(PGALLOC, zone, 1 << order);
> +
> +	zone_clear_flag(zone, ZONE_ALL_UNRECLAIMABLE);
> +	zone->pages_scanned = 0;
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +
> +	if (capture_page)
> +		prep_new_page(capture_page, order, gfp_mask);
> +
> +	return capture_page;
> +}
> +
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  /*
>   * All pages in the range must be isolated before calling this.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/