linux-kernel - Re: [PATCH v3 3/4] arm64: kdump: support more than one crash kernel regions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190414121058.GC20947@rapoport-lnx>
Date:   Sun, 14 Apr 2019 15:10:59 +0300
From:   Mike Rapoport <rppt@...ux.ibm.com>
To:     Chen Zhou <chenzhou10@...wei.com>
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        ebiederm@...ssion.com, catalin.marinas@....com,
        will.deacon@....com, akpm@...ux-foundation.org,
        ard.biesheuvel@...aro.org, horms@...ge.net.au,
        takahiro.akashi@...aro.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        linux-mm@...ck.org, wangkefeng.wang@...wei.com
Subject: Re: [PATCH v3 3/4] arm64: kdump: support more than one crash kernel
 regions

Hi,

On Thu, Apr 11, 2019 at 08:17:43PM +0800, Chen Zhou wrote:
> Hi Mike,
> 
> This overall looks well.
> Replacing memblock_cap_memory_range() with memblock_cap_memory_ranges() was what i wanted
> to do in v1, sorry for don't express that clearly.

I didn't object to memblock_cap_memory_ranges() in general, I was worried
about it's complexity and I hoped that we could find a simpler solution.
 
> But there are some issues as below. After fixing this, it can work correctly.
> 
> On 2019/4/10 21:09, Mike Rapoport wrote:
> > Hi,
> > 
> > On Tue, Apr 09, 2019 at 06:28:18PM +0800, Chen Zhou wrote:
> >> After commit (arm64: kdump: support reserving crashkernel above 4G),
> >> there may be two crash kernel regions, one is below 4G, the other is
> >> above 4G.
> >>
> >> Crash dump kernel reads more than one crash kernel regions via a dtb
> >> property under node /chosen,
> >> linux,usable-memory-range = <BASE1 SIZE1 [BASE2 SIZE2]>
> >>
> >> Signed-off-by: Chen Zhou <chenzhou10@...wei.com>
> >> ---
> >>  arch/arm64/mm/init.c     | 66 ++++++++++++++++++++++++++++++++++++++++--------
> >>  include/linux/memblock.h |  6 +++++
> >>  mm/memblock.c            |  7 ++---
> >>  3 files changed, 66 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >> index 3bebddf..0f18665 100644
> >> --- a/arch/arm64/mm/init.c
> >> +++ b/arch/arm64/mm/init.c
> >> @@ -65,6 +65,11 @@ phys_addr_t arm64_dma_phys_limit __ro_after_init;
> >>  
> >>  #ifdef CONFIG_KEXEC_CORE
> >>  
> >> +/* at most two crash kernel regions, low_region and high_region */
> >> +#define CRASH_MAX_USABLE_RANGES	2
> >> +#define LOW_REGION_IDX			0
> >> +#define HIGH_REGION_IDX			1
> >> +
> >>  /*
> >>   * reserve_crashkernel() - reserves memory for crash kernel
> >>   *
> >> @@ -297,8 +302,8 @@ static int __init early_init_dt_scan_usablemem(unsigned long node,
> >>  		const char *uname, int depth, void *data)
> >>  {
> >>  	struct memblock_region *usablemem = data;
> >> -	const __be32 *reg;
> >> -	int len;
> >> +	const __be32 *reg, *endp;
> >> +	int len, nr = 0;
> >>  
> >>  	if (depth != 1 || strcmp(uname, "chosen") != 0)
> >>  		return 0;
> >> @@ -307,22 +312,63 @@ static int __init early_init_dt_scan_usablemem(unsigned long node,
> >>  	if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
> >>  		return 1;
> >>  
> >> -	usablemem->base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> >> -	usablemem->size = dt_mem_next_cell(dt_root_size_cells, &reg);
> >> +	endp = reg + (len / sizeof(__be32));
> >> +	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
> >> +		usablemem[nr].base = dt_mem_next_cell(dt_root_addr_cells, &reg);
> >> +		usablemem[nr].size = dt_mem_next_cell(dt_root_size_cells, &reg);
> >> +
> >> +		if (++nr >= CRASH_MAX_USABLE_RANGES)
> >> +			break;
> >> +	}
> >>  
> >>  	return 1;
> >>  }
> >>  
> >>  static void __init fdt_enforce_memory_region(void)
> >>  {
> >> -	struct memblock_region reg = {
> >> -		.size = 0,
> >> -	};
> >> +	int i, cnt = 0;
> >> +	struct memblock_region regs[CRASH_MAX_USABLE_RANGES];
> > 
> > I only now noticed that fdt_enforce_memory_region() uses memblock_region to
> > pass the ranges around. If we'd switch to memblock_type instead, the
> > implementation of memblock_cap_memory_ranges() would be really
> > straightforward. Can you check if the below patch works for you? 
> > 
> >>From e476d584098e31273af573e1a78e308880c5cf28 Mon Sep 17 00:00:00 2001
> > From: Mike Rapoport <rppt@...ux.ibm.com>
> > Date: Wed, 10 Apr 2019 16:02:32 +0300
> > Subject: [PATCH] memblock: extend memblock_cap_memory_range to multiple ranges
> > 
> > The memblock_cap_memory_range() removes all the memory except the range
> > passed to it. Extend this function to recieve memblock_type with the
> > regions that should be kept. This allows switching to simple iteration over
> > memblock arrays with 'for_each_mem_range' to remove the unneeded memory.
> > 
> > Enable use of this function in arm64 for reservation of multile regions for
> > the crash kernel.
> > 
> > Signed-off-by: Mike Rapoport <rppt@...ux.ibm.com>
> > ---
> >  arch/arm64/mm/init.c     | 34 ++++++++++++++++++++++++----------
> >  include/linux/memblock.h |  2 +-
> >  mm/memblock.c            | 45 ++++++++++++++++++++++-----------------------
> >  3 files changed, 47 insertions(+), 34 deletions(-)
> > 
> >  
> > -void __init memblock_cap_memory_range(phys_addr_t base, phys_addr_t size)
> > +void __init memblock_cap_memory_ranges(struct memblock_type *regions_to_keep)
> >  {
> > -	int start_rgn, end_rgn;
> > -	int i, ret;
> > -
> > -	if (!size)
> > -		return;
> > -
> > -	ret = memblock_isolate_range(&memblock.memory, base, size,
> > -						&start_rgn, &end_rgn);
> > -	if (ret)
> > -		return;
> > -
> > -	/* remove all the MAP regions */
> > -	for (i = memblock.memory.cnt - 1; i >= end_rgn; i--)
> > -		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> > -			memblock_remove_region(&memblock.memory, i);
> > +	phys_addr_t start, end;
> > +	u64 i;
> >  
> > -	for (i = start_rgn - 1; i >= 0; i--)
> > -		if (!memblock_is_nomap(&memblock.memory.regions[i]))
> > -			memblock_remove_region(&memblock.memory, i);
> > +	/* truncate memory while skipping NOMAP regions */
> > +	for_each_mem_range(i, &memblock.memory, regions_to_keep, NUMA_NO_NODE,
> > +			   MEMBLOCK_NONE, &start, &end, NULL)
> > +		memblock_remove(start, end);
> 
> 1. use memblock_remove(start, size) instead of memblock_remove(start, end).
> 
> 2. There is a another hidden issue. We couldn't mix __next_mem_range()(called by for_each_mem_range) operation
> with remove operation because __next_mem_range() records the index of last time. If we do remove between
> __next_mem_range(), the index may be mess.

Oops, I've really missed that :)
 
> Therefore, we could do remove operation after for_each_mem_range like this, solution A:
>  void __init memblock_cap_memory_ranges(struct memblock_type *regions_to_keep)
>  {
> -	phys_addr_t start, end;
> -	u64 i;
> +	phys_addr_t start[INIT_MEMBLOCK_RESERVED_REGIONS * 2];
> +	phys_addr_t end[INIT_MEMBLOCK_RESERVED_REGIONS * 2];
> +	u64 i, nr = 0;
> 
>  	/* truncate memory while skipping NOMAP regions */
>  	for_each_mem_range(i, &memblock.memory, regions_to_keep, NUMA_NO_NODE,
> -			   MEMBLOCK_NONE, &start, &end, NULL)
> -		memblock_remove(start, end);
> +			   MEMBLOCK_NONE, &start[nr], &end[nr], NULL)
> +		nr++;
> +	for (i = 0; i < nr; i++)
> +		memblock_remove(start[i], end[i] - start[i]);
> 
>  	/* truncate the reserved regions */
> +	nr = 0;
>  	for_each_mem_range(i, &memblock.reserved, regions_to_keep, NUMA_NO_NODE,
> -			   MEMBLOCK_NONE, &start, &end, NULL)
> -		memblock_remove_range(&memblock.reserved, start, end);
> +			   MEMBLOCK_NONE, &start[nr], &end[nr], NULL)
> +		nr++;
> +	for (i = 0; i < nr; i++)
> +		memblock_remove_range(&memblock.reserved, start[i],
> +				end[i] - start[i]);
>  }
> 
> But a warning occurs when compiling:
>   CALL    scripts/atomic/check-atomics.sh
>   CALL    scripts/checksyscalls.sh
>   CHK     include/generated/compile.h
>   CC      mm/memblock.o
> mm/memblock.c: In function ‘memblock_cap_memory_ranges’:
> mm/memblock.c:1635:1: warning: the frame size of 36912 bytes is larger than 2048 bytes [-Wframe-larger-than=]
>  }
> 
> another solution is my implementation in v1, solution B:
> +void __init memblock_cap_memory_ranges(struct memblock_type *regions_to_keep)
> +{
> +   int start_rgn[INIT_MEMBLOCK_REGIONS], end_rgn[INIT_MEMBLOCK_REGIONS];
> +   int i, j, ret, nr = 0;
> +   memblock_region *regs = regions_to_keep->regions;
> +
> +   nr = regions_to_keep -> cnt;
> +   if (!nr)
> +       return;
> +
> +   /* remove all the MAP regions */
> +   for (i = memblock.memory.cnt - 1; i >= end_rgn[nr - 1]; i--)
> +       if (!memblock_is_nomap(&memblock.memory.regions[i]))
> +           memblock_remove_region(&memblock.memory, i);
> +
> +   for (i = nr - 1; i > 0; i--)
> +       for (j = start_rgn[i] - 1; j >= end_rgn[i - 1]; j--)
> +           if (!memblock_is_nomap(&memblock.memory.regions[j]))
> +               memblock_remove_region(&memblock.memory, j);
> +
> +   for (i = start_rgn[0] - 1; i >= 0; i--)
> +       if (!memblock_is_nomap(&memblock.memory.regions[i]))
> +           memblock_remove_region(&memblock.memory, i);
> +
> +   /* truncate the reserved regions */
> +   memblock_remove_range(&memblock.reserved, 0, regs[0].base);
> +
> +   for (i = nr - 1; i > 0; i--)
> +       memblock_remove_range(&memblock.reserved,
> +               regs[i - 1].base + regs[i - 1].size,
> +		regs[i].base - regs[i - 1].base - regs[i - 1].size);
> +
> +   memblock_remove_range(&memblock.reserved,
> +           regs[nr - 1].base + regs[nr - 1].size, PHYS_ADDR_MAX);
> +}
> 
> solution A: 	phys_addr_t start[INIT_MEMBLOCK_RESERVED_REGIONS * 2];
> 		phys_addr_t end[INIT_MEMBLOCK_RESERVED_REGIONS * 2];
> start, end is physical addr
> 
> solution B: 	int start_rgn[INIT_MEMBLOCK_REGIONS], end_rgn[INIT_MEMBLOCK_REGIONS];
> start_rgn, end_rgn is rgn index		
> 
> Solution B do less remove operations and with no warning comparing to solution A.
> I think solution B is better, could you give some suggestions?
 
Solution B is indeed better that solution A, but I'm still worried by
relatively large arrays on stack and the amount of loops :(

The very least we could do is to call memblock_cap_memory_range() to drop
the memory before and after the ranges we'd like to keep.

> >  
> >  	/* truncate the reserved regions */
> > -	memblock_remove_range(&memblock.reserved, 0, base);
> > -	memblock_remove_range(&memblock.reserved,
> > -			base + size, PHYS_ADDR_MAX);
> > +	for_each_mem_range(i, &memblock.reserved, regions_to_keep, NUMA_NO_NODE,
> > +			   MEMBLOCK_NONE, &start, &end, NULL)
> > +		memblock_remove_range(&memblock.reserved, start, end);
> 
> There are the same issues as above.
> 
> >  }
> >  
> >  void __init memblock_mem_limit_remove_map(phys_addr_t limit)
> >  {
> > +	struct memblock_region rgn = {
> > +		.base = 0,
> > +	};
> > +
> > +	struct memblock_type region_to_keep = {
> > +		.cnt = 1,
> > +		.max = 1,
> > +		.regions = &rgn,
> > +	};
> > +
> >  	phys_addr_t max_addr;
> >  
> >  	if (!limit)
> > @@ -1646,7 +1644,8 @@ void __init memblock_mem_limit_remove_map(phys_addr_t limit)
> >  	if (max_addr == PHYS_ADDR_MAX)
> >  		return;
> >  
> > -	memblock_cap_memory_range(0, max_addr);
> > +	region_to_keep.regions[0].size = max_addr;
> > +	memblock_cap_memory_ranges(&region_to_keep);
> >  }
> >  
> >  static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
> > 
> 
> Thanks,
> Chen Zhou
> 

-- 
Sincerely yours,
Mike.