lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 23 Oct 2017 13:55:47 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Pavel Tatashin <pasha.tatashin@...cle.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        mgorman@...hsingularity.net, akpm@...ux-foundation.org
Subject: Re: [PATCH v1] mm: broken deferred calculation

On Fri 20-10-17 21:17:07, Pavel Tatashin wrote:
> In reset_deferred_meminit we determine number of pages that must not be
> deferred. We initialize pages for at least 2G of memory, but also pages for
> reserved memory in this node.
> 
> The reserved memory is determined in this function:
> memblock_reserved_memory_within(), which operates over physical addresses,
> and returns size in bytes. However, reset_deferred_meminit() assumes that
> that this function operates with pfns, and returns page count.
> 
> The result is that in the best case machine boots slower than expected
> due to initializing more pages than needed in single thread, and in the
> worst case panics because fewer than needed pages are initialized early.

Hmm, I have definitely screwed up pfns and addresses here. I am
wondering how this could work in the end. I remember this has been
tested on the PPC machine which exhibited the problem.

> Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@...cle.com>

Thanks for catching that! The patch could have been simpler without all
the renames but I have no objections here.
Acked-by: Michal Hocko <mhocko@...e.com>

> ---
>  include/linux/mmzone.h |  3 ++-
>  mm/page_alloc.c        | 27 ++++++++++++++++++---------
>  2 files changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index a6f361931d52..d45ba78c7e42 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -699,7 +699,8 @@ typedef struct pglist_data {
>  	 * is the first PFN that needs to be initialised.
>  	 */
>  	unsigned long first_deferred_pfn;
> -	unsigned long static_init_size;
> +	/* Number of non-deferred pages */
> +	unsigned long static_init_pgcnt;
>  #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 97687b38da05..16419cdbbb7a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -289,28 +289,37 @@ EXPORT_SYMBOL(nr_online_nodes);
>  int page_group_by_mobility_disabled __read_mostly;
>  
>  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +
> +/*
> + * Determine how many pages need to be initialized durig early boot
> + * (non-deferred initialization).
> + * The value of first_deferred_pfn will be set later, once non-deferred pages
> + * are initialized, but for now set it ULONG_MAX.
> + */
>  static inline void reset_deferred_meminit(pg_data_t *pgdat)
>  {
> -	unsigned long max_initialise;
> -	unsigned long reserved_lowmem;
> +	phys_addr_t start_addr, end_addr;
> +	unsigned long max_pgcnt;
> +	unsigned long reserved;
>  
>  	/*
>  	 * Initialise at least 2G of a node but also take into account that
>  	 * two large system hashes that can take up 1GB for 0.25TB/node.
>  	 */
> -	max_initialise = max(2UL << (30 - PAGE_SHIFT),
> -		(pgdat->node_spanned_pages >> 8));
> +	max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
> +			(pgdat->node_spanned_pages >> 8));
>  
>  	/*
>  	 * Compensate the all the memblock reservations (e.g. crash kernel)
>  	 * from the initial estimation to make sure we will initialize enough
>  	 * memory to boot.
>  	 */
> -	reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
> -			pgdat->node_start_pfn + max_initialise);
> -	max_initialise += reserved_lowmem;
> +	start_addr = PFN_PHYS(pgdat->node_start_pfn);
> +	end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
> +	reserved = memblock_reserved_memory_within(start_addr, end_addr);
> +	max_pgcnt += PHYS_PFN(reserved);
>  
> -	pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
> +	pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
>  	pgdat->first_deferred_pfn = ULONG_MAX;
>  }
>  
> @@ -337,7 +346,7 @@ static inline bool update_defer_init(pg_data_t *pgdat,
>  	if (zone_end < pgdat_end_pfn(pgdat))
>  		return true;
>  	(*nr_initialised)++;
> -	if ((*nr_initialised > pgdat->static_init_size) &&
> +	if ((*nr_initialised > pgdat->static_init_pgcnt) &&
>  	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
>  		pgdat->first_deferred_pfn = pfn;
>  		return false;
> -- 
> 2.14.2

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ