linux-kernel - Re: [PATCH v3 0/6] Introduce ZONE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160527072702.GA7782@shbuild888>
Date:	Fri, 27 May 2016 15:27:02 +0800
From:	Feng Tang <feng.tang@...el.com>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	"mgorman@...hsingularity.net" <mgorman@...hsingularity.net>,
	Laura Abbott <lauraa@...eaurora.org>,
	Minchan Kim <minchan@...nel.org>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Michal Nazarewicz <mina86@...a86.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Rui Teng <rui.teng@...ux.vnet.ibm.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/6] Introduce ZONE_CMA

On Fri, May 27, 2016 at 02:42:18PM +0800, Joonsoo Kim wrote:
> On Fri, May 27, 2016 at 02:25:27PM +0800, Feng Tang wrote:
> > On Fri, May 27, 2016 at 01:28:20PM +0800, Joonsoo Kim wrote:
> > > On Thu, May 26, 2016 at 04:04:54PM +0800, Feng Tang wrote:
> > > > On Thu, May 26, 2016 at 02:22:22PM +0800, js1304@...il.com wrote:
> > > > > From: Joonsoo Kim <iamjoonsoo.kim@....com>
> > > > 
> >  
> > > > > FYI, there is another attempt [3] trying to solve this problem in lkml.
> > > > > And, as far as I know, Qualcomm also has out-of-tree solution for this
> > > > > problem.
> > > > 
> > > > This may be a little off-topic :) Actually, we have used another way in
> > > > our products, that we disable the fallback from MIGRATETYE_MOVABLE to
> > > > MIGRATETYPE_CMA completely, and only allow free CMA memory to be used
> > > > by file page cache (which is easy to be reclaimed by its nature). 
> > > > We did it by adding a GFP_PAGE_CACHE to every allocation request for
> > > > page cache, and the MM will try to pick up an available free CMA page
> > > > first, and goes to normal path when fail. 
> > > 
> > > Just wonder, why do you allow CMA memory to file page cache rather
> > > than anonymous page? I guess that anonymous pages would be more easily
> > > migrated/reclaimed than file page cache. In fact, some of our product
> > > uses anonymous page adaptation to satisfy similar requirement by
> > > introducing GFP_CMA. AFAIK, some of chip vendor also uses "anonymous
> > > page first adaptation" to get better success rate.
> > 
> > The biggest problem we faced is to allocate big chunk of CMA memory,
> > say 256MB in a whole, or 9 pieces of 20MB buffers, so the speed
> > is not the biggest concern, but whether all the cma pages be reclaimed.
> 
> Okay. Our product have similar workload.
> 
> > With the MOVABLE fallback, there may be many types of bad guys from device
> > drivers/kernel or different subsystems, who refuse to return the borrowed
> > cma pages, so I took a lazy way by only allowing page cache to use free
> > cma pages, and we see good results which could pass most of the test for
> > allocating big chunks. 
> 
> Could you explain more about why file page cache rather than anonymous page?
> If there is a reason, I'd like to test it by myself.

I didn't make it clear. This is not for anonymous page, but for MIGRATETYPE_MOVABLE.

following is the patch to disable the kernel default sharing (kernel 3.14)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b5f20e..a5e698f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -974,7 +974,11 @@ static int fallbacks[MIGRATE_TYPES][4] = {
 	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,     MIGRATE_RESERVE },
 	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,     MIGRATE_RESERVE },
 #ifdef CONFIG_CMA
-	[MIGRATE_MOVABLE]     = { MIGRATE_CMA,         MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
 	[MIGRATE_CMA]         = { MIGRATE_RESERVE }, /* Never used */
 	[MIGRATE_CMA_ISOLATE] = { MIGRATE_RESERVE }, /* Never used */
 #else
@@ -1414,6 +1418,18 @@ void free_hot_cold_page(struct page *page, int cold)
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 
+#ifndef CONFIG_USE_CMA_FALLBACK
+	if (migratetype == MIGRATE_CMA) {
+		free_one_page(zone, page, 0, MIGRATE_CMA);
+		local_irq_restore(flags);
+		return;
+	}
+#endif
+

> 
> > One of the customer used to use a CMA sharing patch from another vendor
> > on our Socs, which can't pass these tests and finally took our page cache
> > approach.
> 
> CMA has too many problems so each vendor uses their own adaptation. I'd
> like to solve this code fragmentation by fixing problems on upstream
> kernel and this ZONE_CMA is one of that effort. If you can share the
> pointer for your adaptation, it would be very helpful to me.

As I said, I started to work on CMA problem back in 2014, and faced many
of these failure in reclamation problems. I didn't have time and capability
to track/analyze each and every failure, but decided to go another way by
only allowing the page cache to use CMA.  And frankly speaking, I don't have
detailed data for performance measurement, but some rough one, that it
did improve the cma page reclaiming and the usage rate.

Our patches was based on 3.14 (the Android Mashmallow kenrel). Earlier this
year I finally got some free time, and worked on cleaning them for submission
to LKML, and found your cma improving patches merged in 4.1 or 4.2, so I gave
up as my patches is more hacky :)

The sharing patch is here FYI:
------
commit fb28d4db6278df42ab2ef4996bdfd44e613ace99
Author: Feng Tang <feng.tang@...el.com>
Date:   Wed Jul 15 13:39:50 2015 +0800

    cma, page-cache: use cma as page cache
    
    This will free a lot of cma memory for system to use them
    as page cache. Previously, cma memory is mostly preserved
    and difficult to be shared by others, thus a big waste.
    
    Using them as page cache will improve the meory usage, while
    keeping the flexibility of fast reclaiming when big cma memory
    request comes.
    
    And some of the threshold values should be adjustable for
    different platforms with different cma reserved memory, common
    cma usage scenario and CTS test should be carefully verified
    for those adjustment.
    
    Signed-off-by: Feng Tang <feng.tang@...el.com>

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5dc12b7..3c3ab2b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -36,6 +36,7 @@ struct vm_area_struct;
 #define ___GFP_NO_KSWAPD	0x400000u
 #define ___GFP_OTHER_NODE	0x800000u
 #define ___GFP_WRITE		0x1000000u
+#define ___GFP_CMA_PAGE_CACHE	0x2000000u
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -123,6 +124,9 @@ struct vm_area_struct;
 			 __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | \
 			 __GFP_NO_KSWAPD)
 
+/* Allocat for page cache use */
+#define GFP_PAGE_CACHE	((__force gfp_t)___GFP_CMA_PAGE_CACHE)
+
 /*
  * GFP_THISNODE does not perform any reclaim, you most likely want to
  * use __GFP_THISNODE to allocate from a given node without fallback!
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 1710d1b..a2452f6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -221,7 +221,7 @@ extern struct page *__page_cache_alloc(gfp_t gfp);
 #else
 static inline struct page *__page_cache_alloc(gfp_t gfp)
 {
-	return alloc_pages(gfp, 0);
+	return alloc_pages(gfp | GFP_PAGE_CACHE, 0);
 }
 #endif
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 532ee0d..1b5f20e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1568,7 +1568,7 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
 	int cold = !!(gfp_flags & __GFP_COLD);
 
 again:
-	if (likely(order == 0)) {
+	if (likely(order == 0) && !(gfp_flags & GFP_PAGE_CACHE)) {
 		struct per_cpu_pages *pcp;
 		struct list_head *list;
 
@@ -2744,6 +2744,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET;
 	struct mem_cgroup *memcg = NULL;
 
+	gfp_allowed_mask |= GFP_PAGE_CACHE;
+
 	gfp_mask &= gfp_allowed_mask;
 
 	lockdep_trace_alloc(gfp_mask);
@@ -2753,6 +2755,25 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
 	if (should_fail_alloc_page(gfp_mask, order))
 		return NULL;
 
+#ifdef CONFIG_CMA
+	if (gfp_mask & GFP_PAGE_CACHE) {
+		int nr_free = global_page_state(NR_FREE_PAGES)
+				- totalreserve_pages;
+		int free_cma = global_page_state(NR_FREE_CMA_PAGES);
+
+		/*
+		 * Use CMA memory as page cache iff system is under memory
+		 * pressure and free cma is big enough (>= 48M).  And these
+		 * value should be adjustable for different platforms with
+		 * different cma reserved memory
+		 */
+		if ((nr_free - free_cma) <= (48 * 1024 * 1024 / PAGE_SIZE)
+			&& free_cma >= (48 * 1024 * 1024 / PAGE_SIZE)) {
+			migratetype = MIGRATE_CMA;
+		}
+	}
+#endif
+
 	/*
 	 * Check the zones suitable for the gfp_mask contain at least one
 	 * valid zone. It's possible to have an empty zonelist as a result