linux-kernel - Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56C0550F.8020402@huawei.com>
Date:	Sun, 14 Feb 2016 18:21:03 +0800
From:	zhong jiang <zhongjiang@...wei.com>
To:	Joonsoo Kim <js1304@...il.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	Aaron Lu <aaron.lu@...el.com>, Mel Gorman <mgorman@...e.de>,
	Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>,
	"Linux Memory Management List" <linux-mm@...ck.org>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Xishi Qiu <qiuxishi@...wei.com>
Subject: Re: [PATCH v2 3/3] mm/compaction: speed up pageblock_pfn_to_page()
 when zone is contiguous

On 2016/2/6 0:11, Joonsoo Kim wrote:
> 2016-02-05 9:49 GMT+09:00 Andrew Morton <akpm@...ux-foundation.org>:
>> On Thu,  4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@...il.com> wrote:
>>
>>> There is a performance drop report due to hugepage allocation and in there
>>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>>> In that workload, compaction is triggered to make hugepage but most of
>>> pageblocks are un-available for compaction due to pageblock type and
>>> skip bit so compaction usually fails. Most costly operations in this case
>>> is to find valid pageblock while scanning whole zone range. To check
>>> if pageblock is valid to compact, valid pfn within pageblock is required
>>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>>> checks whether pageblock is in a single zone and return valid pfn
>>> if possible. Problem is that we need to check it every time before
>>> scanning pageblock even if we re-visit it and this turns out to
>>> be very expensive in this workload.
>>>
>>> Although we have no way to skip this pageblock check in the system
>>> where hole exists at arbitrary position, we can use cached value for
>>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>>> exist. This optimization considerably speeds up in above workload.
>>>
>>> Before vs After
>>> Max: 1096 MB/s vs 1325 MB/s
>>> Min: 635 MB/s 1015 MB/s
>>> Avg: 899 MB/s 1194 MB/s
>>>
>>> Avg is improved by roughly 30% [2].
>>>
>>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>>> [2]: https://lkml.org/lkml/2015/12/9/23
>>>
>>> ...
>>>
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -196,6 +196,9 @@ void put_online_mems(void);
>>>  void mem_hotplug_begin(void);
>>>  void mem_hotplug_done(void);
>>>
>>> +extern void set_zone_contiguous(struct zone *zone);
>>> +extern void clear_zone_contiguous(struct zone *zone);
>>> +
>>>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>>>  /*
>>>   * Stub functions for when hotplug is off
>>
>> Was it really intended that these declarations only exist if
>> CONFIG_MEMORY_HOTPLUG?  Seems unrelated.
> 
> These are called for caching memory layout whether it is contiguous
> or not. So, they are always called in memory initialization. Then,
> hotplug could change memory layout so they should be called
> there, too. So, they are defined in page_alloc.c and exported only
> if CONFIG_MEMORY_HOTPLUG.
> 
>> The i386 allnocofnig build fails in preditable ways so I fixed that up
>> as below, but it seems wrong.
> 
> Yeah, it seems wrong to me. :)
> Here goes fix.
> 
> ----------->8------------
>>>From ed6add18bc361e00a7ac6746de6eeb62109e6416 Mon Sep 17 00:00:00 2001
> From: Joonsoo Kim <iamjoonsoo.kim@....com>
> Date: Thu, 10 Dec 2015 17:03:54 +0900
> Subject: [PATCH] mm/compaction: speed up pageblock_pfn_to_page() when zone is
>  contiguous
> 
> There is a performance drop report due to hugepage allocation and in there
> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
> In that workload, compaction is triggered to make hugepage but most of
> pageblocks are un-available for compaction due to pageblock type and
> skip bit so compaction usually fails. Most costly operations in this case
> is to find valid pageblock while scanning whole zone range. To check
> if pageblock is valid to compact, valid pfn within pageblock is required
> and we can obtain it by calling pageblock_pfn_to_page(). This function
> checks whether pageblock is in a single zone and return valid pfn
> if possible. Problem is that we need to check it every time before
> scanning pageblock even if we re-visit it and this turns out to
> be very expensive in this workload.
> 
> Although we have no way to skip this pageblock check in the system
> where hole exists at arbitrary position, we can use cached value for
> zone continuity and just do pfn_to_page() in the system where hole doesn't
> exist. This optimization considerably speeds up in above workload.
> 
> Before vs After
> Max: 1096 MB/s vs 1325 MB/s
> Min: 635 MB/s 1015 MB/s
> Avg: 899 MB/s 1194 MB/s
> 
> Avg is improved by roughly 30% [2].
> 
> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
> [2]: https://lkml.org/lkml/2015/12/9/23
> 
> v3
> o remove pfn_valid_within() check for all pages in the pageblock
> because pageblock_pfn_to_page() is only called with pageblock aligned pfn.

I have a question about the zone continuity. because hole exists at
arbitrary position in a page block. Therefore, only pageblock_pf_to_page()
is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within()
is necessary.

eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by
pageblock_pfn_to_page() to conclude in the result is inaccurate

Thanks
zhongjiang

> v2
> o checking zone continuity after initialization
> o handle memory-hotplug case
> 
> Reported and Tested-by: Aaron Lu <aaron.lu@...el.com>
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> ---
>  include/linux/gfp.h            |  6 ----
>  include/linux/memory_hotplug.h |  3 ++
>  include/linux/mmzone.h         |  2 ++
>  mm/compaction.c                | 43 -----------------------
>  mm/internal.h                  | 12 +++++++
>  mm/memory_hotplug.c            |  9 +++++
>  mm/page_alloc.c                | 78 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 103 insertions(+), 50 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 28ad5f6..bd7fccc 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct
> per_cpu_pages *pcp);
>  void drain_all_pages(struct zone *zone);
>  void drain_local_pages(struct zone *zone);
> 
> -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  void page_alloc_init_late(void);
> -#else
> -static inline void page_alloc_init_late(void)
> -{
> -}
> -#endif
> 
>  /*
>   * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 4340599..e960b78 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -196,6 +196,9 @@ void put_online_mems(void);
>  void mem_hotplug_begin(void);
>  void mem_hotplug_done(void);
> 
> +extern void set_zone_contiguous(struct zone *zone);
> +extern void clear_zone_contiguous(struct zone *zone);
> +
>  #else /* ! CONFIG_MEMORY_HOTPLUG */
>  /*
>   * Stub functions for when hotplug is off
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7b6c2cf..f12b950 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -520,6 +520,8 @@ struct zone {
>   bool compact_blockskip_flush;
>  #endif
> 
> + bool contiguous;
> +
>   ZONE_PADDING(_pad3_)
>   /* Zone statistics */
>   atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8ce36eb..93f71d9 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype)
>   return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE;
>  }
> 
> -/*
> - * Check that the whole (or subset of) a pageblock given by the interval of
> - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> - * with the migration of free compaction scanner. The scanners then need to
> - * use only pfn_valid_within() check for arches that allow holes within
> - * pageblocks.
> - *
> - * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> - *
> - * It's possible on some configurations to have a setup like node0 node1 node0
> - * i.e. it's possible that all pages within a zones range of pages do not
> - * belong to a single zone. We assume that a border between node0 and node1
> - * can occur within a single pageblock, but not a node0 node1 node0
> - * interleaving within a single pageblock. It is therefore sufficient to check
> - * the first and last page of a pageblock and avoid checking each individual
> - * page in a pageblock.
> - */
> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> - unsigned long end_pfn, struct zone *zone)
> -{
> - struct page *start_page;
> - struct page *end_page;
> -
> - /* end_pfn is one past the range we are checking */
> - end_pfn--;
> -
> - if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> - return NULL;
> -
> - start_page = pfn_to_page(start_pfn);
> -
> - if (page_zone(start_page) != zone)
> - return NULL;
> -
> - end_page = pfn_to_page(end_pfn);
> -
> - /* This gives a shorter code than deriving page_zone(end_page) */
> - if (page_zone_id(start_page) != page_zone_id(end_page))
> - return NULL;
> -
> - return start_page;
> -}
> -
>  #ifdef CONFIG_COMPACTION
> 
>  /* Do not skip compaction more than 64 times */
> diff --git a/mm/internal.h b/mm/internal.h
> index 9006ce1..9609755 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -140,6 +140,18 @@ __find_buddy_index(unsigned long page_idx,
> unsigned int order)
>   return page_idx ^ (1 << order);
>  }
> 
> +extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone);
> +
> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone)
> +{
> + if (zone->contiguous)
> + return pfn_to_page(start_pfn);
> +
> + return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
> +}
> +
>  extern int __isolate_free_page(struct page *page, unsigned int order);
>  extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
>   unsigned int order);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4af58a3..f06916c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone,
> unsigned long phys_start_pfn,
>   int start_sec, end_sec;
>   struct vmem_altmap *altmap;
> 
> + clear_zone_contiguous(zone);
> +
>   /* during initialize mem_map, align hot-added range to section */
>   start_sec = pfn_to_section_nr(phys_start_pfn);
>   end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1);
> @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone,
> unsigned long phys_start_pfn,
>   }
>   vmemmap_populate_print_last();
> 
> + set_zone_contiguous(zone);
> +
>   return err;
>  }
>  EXPORT_SYMBOL_GPL(__add_pages);
> @@ -811,6 +815,8 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>   }
>   }
> 
> + clear_zone_contiguous(zone);
> +
>   /*
>   * We can only remove entire sections
>   */
> @@ -826,6 +832,9 @@ int __remove_pages(struct zone *zone, unsigned
> long phys_start_pfn,
>   if (ret)
>   break;
>   }
> +
> + set_zone_contiguous(zone);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(__remove_pages);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d60c860..059f9c0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1105,6 +1105,75 @@ void __init __free_pages_bootmem(struct page
> *page, unsigned long pfn,
>   return __free_pages_boot_core(page, pfn, order);
>  }
> 
> +/*
> + * Check that the whole (or subset of) a pageblock given by the interval of
> + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it
> + * with the migration of free compaction scanner. The scanners then need to
> + * use only pfn_valid_within() check for arches that allow holes within
> + * pageblocks.
> + *
> + * Return struct page pointer of start_pfn, or NULL if checks were not passed.
> + *
> + * It's possible on some configurations to have a setup like node0 node1 node0
> + * i.e. it's possible that all pages within a zones range of pages do not
> + * belong to a single zone. We assume that a border between node0 and node1
> + * can occur within a single pageblock, but not a node0 node1 node0
> + * interleaving within a single pageblock. It is therefore sufficient to check
> + * the first and last page of a pageblock and avoid checking each individual
> + * page in a pageblock.
> + */
> +struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
> + unsigned long end_pfn, struct zone *zone)
> +{
> + struct page *start_page;
> + struct page *end_page;
> +
> + /* end_pfn is one past the range we are checking */
> + end_pfn--;
> +
> + if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn))
> + return NULL;
> +
> + start_page = pfn_to_page(start_pfn);
> +
> + if (page_zone(start_page) != zone)
> + return NULL;
> +
> + end_page = pfn_to_page(end_pfn);
> +
> + /* This gives a shorter code than deriving page_zone(end_page) */
> + if (page_zone_id(start_page) != page_zone_id(end_page))
> + return NULL;
> +
> + return start_page;
> +}
> +
> +void set_zone_contiguous(struct zone *zone)
> +{
> + unsigned long block_start_pfn = zone->zone_start_pfn;
> + unsigned long block_end_pfn;
> +
> + block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
> + for (; block_start_pfn < zone_end_pfn(zone);
> + block_start_pfn = block_end_pfn,
> + block_end_pfn += pageblock_nr_pages) {
> +
> + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
> +
> + if (!__pageblock_pfn_to_page(block_start_pfn,
> + block_end_pfn, zone))
> + return;
> + }
> +
> + /* We confirm that there is no hole */
> + zone->contiguous = true;
> +}
> +
> +void clear_zone_contiguous(struct zone *zone)
> +{
> + zone->contiguous = false;
> +}
> +
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>  static void __init deferred_free_range(struct page *page,
>   unsigned long pfn, int nr_pages)
> @@ -1255,9 +1324,13 @@ free_range:
>   pgdat_init_report_one_done();
>   return 0;
>  }
> +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> 
>  void __init page_alloc_init_late(void)
>  {
> + struct zone *zone;
> +
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>   int nid;
> 
>   /* There will be num_node_state(N_MEMORY) threads */
> @@ -1271,8 +1344,11 @@ void __init page_alloc_init_late(void)
> 
>   /* Reinit limits that are based on free pages after the kernel is up */
>   files_maxfiles_init();
> +#endif
> +
> + for_each_populated_zone(zone)
> + set_zone_contiguous(zone);
>  }
> -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> 
>  #ifdef CONFIG_CMA
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */