[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151207073523.GA27292@js1304-P5Q-DELUXE>
Date: Mon, 7 Dec 2015 16:35:24 +0900
From: Joonsoo Kim <iamjoonsoo.kim@....com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Aaron Lu <aaron.lu@...el.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Mel Gorman <mgorman@...e.de>, Minchan Kim <minchan@...nel.org>
Subject: Re: [RFC 0/3] reduce latency of direct async compaction
On Fri, Dec 04, 2015 at 01:34:09PM +0100, Vlastimil Babka wrote:
> On 12/03/2015 12:52 PM, Aaron Lu wrote:
> >On Thu, Dec 03, 2015 at 07:35:08PM +0800, Aaron Lu wrote:
> >>On Thu, Dec 03, 2015 at 10:38:50AM +0100, Vlastimil Babka wrote:
> >>>On 12/03/2015 10:25 AM, Aaron Lu wrote:
> >>>>On Thu, Dec 03, 2015 at 09:10:44AM +0100, Vlastimil Babka wrote:
> >>
> >>My bad, I uploaded the wrong data :-/
> >>I uploaded again:
> >>https://drive.google.com/file/d/0B49uX3igf4K4UFI4TEQ3THYta0E
> >>
> >>And I just run the base tree with trace-cmd and found that its
> >>performace drops significantly(from 1000MB/s to 6xxMB/s), is it that
> >>trace-cmd will impact performace a lot?
>
> Yeah it has some overhead depending on how many events it has to
> process. Your workload is quite sensitive to that.
>
> >>Any suggestions on how to run
> >>the test regarding trace-cmd? i.e. should I aways run usemem under
> >>trace-cmd or only when necessary?
>
> I'd run it with tracing only when the goal is to collect traces, but
> not for any performance comparisons. Also it's not useful to collect
> perf data while also tracing.
>
> >I just run the test with the base tree and with this patch series
> >applied(head), I didn't use trace-cmd this time.
> >
> >The throughput for base tree is 963MB/s while the head is 815MB/s, I
> >have attached pagetypeinfo/proc-vmstat/perf-profile for them.
>
> The compact stats improvements look fine, perhaps better than in my tests:
>
> base: compact_migrate_scanned 3476360
> head: compact_migrate_scanned 1020827
>
> - that's the eager skipping of patch 2
>
> base: compact_free_scanned 5924928
> head: compact_free_scanned 0
> compact_free_direct 918813
> compact_free_direct_miss 500308
>
> As your workload does exclusively async direct compaction through
> THP faults, the traditional free scanner isn't used at all. Direct
> allocations should be much cheaper, although the "miss" ratio (the
> allocations that were from the same pageblock as the one we are
> compacting) is quite high. I should probably look into making
> migration release pages to the tails of the freelists - could be
> that it's grabbing the very pages that were just freed in the
> previous COMPACT_CLUSTER_MAX cycle (modulo pcplist buffering).
>
> I however find it strange that your original stats (4.3?) differ
> from the base so much:
>
> compact_migrate_scanned 1982396
> compact_free_scanned 40576943
>
> That was order of magnitude more free scanned on 4.3, and half the
> migrate scanned. But your throughput figures in the other mail
> suggested a regression from 4.3 to 4.4, which would be the opposite
> of what the stats say. And anyway, compaction code didn't change
> between 4.3 and 4.4 except changes to tracepoint format...
>
> moving on...
> base:
> compact_isolated 731304
> compact_stall 10561
> compact_fail 9459
> compact_success 1102
>
> head:
> compact_isolated 921087
> compact_stall 14451
> compact_fail 12550
> compact_success 1901
>
> More success in both isolation and compaction results.
>
> base:
> thp_fault_alloc 45337
> thp_fault_fallback 2349
>
> head:
> thp_fault_alloc 45564
> thp_fault_fallback 2120
>
> Somehow the extra compact success didn't fully translate to thp
> alloc success... But given how many of the alloc's didn't even
> involve a compact_stall (two thirds of them), that interpretation
> could also be easily misleading. So, hard to say.
>
> Looking at the perf profiles...
> base:
> 54.55% 54.55% :1550 [kernel.kallsyms] [k]
> pageblock_pfn_to_page
>
> head:
> 40.13% 40.13% :1551 [kernel.kallsyms] [k]
> pageblock_pfn_to_page
>
> Since the freepage allocation doesn't hit this code anymore, it
> shows that the bulk was actually from the migration scanner,
> although the perf callgraph and vmstats suggested otherwise.
It looks like overhead still remain. I guess that migration scanner
would call pageblock_pfn_to_page() for more extended range so
overhead still remain.
I have an idea to solve his problem. Aaron, could you test following patch
on top of base? It tries to skip calling pageblock_pfn_to_page()
if we check that zone is contiguous at initialization stage.
Thanks.
---->8----
>From 9c4fbf8f8ed37eb88a04a97908e76ba2437404a2 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@....com>
Date: Mon, 7 Dec 2015 14:51:42 +0900
Subject: [PATCH] mm/compaction: Optimize pageblock_pfn_to_page() for
contiguous zone
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
---
include/linux/mmzone.h | 1 +
mm/compaction.c | 35 ++++++++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e23a9e7..573f9a9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -521,6 +521,7 @@ struct zone {
#endif
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
+ int contiguous;
/* Set to true when the PG_migrate_skip bits should be cleared */
bool compact_blockskip_flush;
#endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 67b8d90..f4e8c89 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int migratetype)
* the first and last page of a pageblock and avoid checking each individual
* page in a pageblock.
*/
-static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
unsigned long end_pfn, struct zone *zone)
{
struct page *start_page;
@@ -114,6 +114,37 @@ static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
return start_page;
}
+static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
+ unsigned long end_pfn, struct zone *zone)
+{
+ if (zone->contiguous == 1)
+ return pfn_to_page(start_pfn);
+
+ return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
+}
+
+static void check_zone_contiguous(struct zone *zone)
+{
+ unsigned long pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(zone);
+
+ /* Already checked */
+ if (zone->contiguous)
+ return;
+
+ pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+ for (; pfn < end_pfn; pfn += pageblock_nr_pages) {
+ if (!__pageblock_pfn_to_page(pfn, end_pfn, zone)) {
+ /* We have hole */
+ zone->contiguous = -1;
+ return;
+ }
+ }
+
+ /* We don't have hole */
+ zone->contiguous = 1;
+}
+
#ifdef CONFIG_COMPACTION
/* Do not skip compaction more than 64 times */
@@ -1353,6 +1384,8 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
;
}
+ check_zone_contiguous(zone);
+
/*
* Clear pageblock skip if there were failures recently and compaction
* is about to be retried after being deferred. kswapd does not do
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists