lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Dec 2019 09:39:58 +0800
From:   Chen Zhou <chenzhou10@...wei.com>
To:     Dave Young <dyoung@...hat.com>
CC:     <tglx@...utronix.de>, <mingo@...hat.com>,
        <catalin.marinas@....com>, <will@...nel.org>,
        <james.morse@....com>, <bhsharma@...hat.com>, <horms@...ge.net.au>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <kexec@...ts.infradead.org>,
        <linux-doc@...r.kernel.org>, <xiexiuqi@...wei.com>,
        kbuild test robot <lkp@...el.com>
Subject: Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into
 crash_core.c

Hi Dave,

On 2019/12/28 17:32, Dave Young wrote:
> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>> Hi Dave
>>
>> On 2019/12/27 13:54, Dave Young wrote:
>>> Hi,
>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>
>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>
>>> Do you have any reason for the difference?  I'd expect we have same
>>> logic if possible and remove some of the ifdefs.
>>
>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>> to reserve low memory.
>>
>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>> and this needs extra considerations.
> 
> Sorry that I did not read the old thread details and thought that is
> arch dependent.  But rethink about that, it would be better that we can
> have same semantic about crashkernel parameters across arches.  If we
> make them different then it causes confusion, especially for
> distributions.
> 
> OTOH, I thought if we reserve high memory then the low memory should be
> needed.  There might be some exceptions, but I do not know the exact
> one, can we make the behavior same, and special case those systems which
> do not need low memory reservation.
> 
I thought like this and did implement with crashkernel parameters arch independent.
This is my v4: https://lkml.org/lkml/2019/5/6/1361, i implemented according to x86_64's
behavior.

>>
>> previous discusses:
>> 	https://lkml.org/lkml/2019/6/5/670
>> 	https://lkml.org/lkml/2019/6/13/229
> 
> Another concern from James:
> "
> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
> "
> 
> The kexec-tools code is iterating all "Crash kernel" ranges and add them
> in an array.  In X86 code, it uses the higher range to locate memory.

We also discussed about this: https://lkml.org/lkml/2019/6/13/227.
I guess James's opinion is that kexec-tools should take forward compatibility into account.
"But we can't rely on people updating user-space when they update the kernel!" -- James

> 
>>
>>>
>>>>
>>>> Reported-by: kbuild test robot <lkp@...el.com>
>>>> Signed-off-by: Chen Zhou <chenzhou10@...wei.com>
>>>> ---
>>>>  arch/x86/kernel/setup.c    | 62 ++++-----------------------------
>>>>  include/linux/crash_core.h |  3 ++
>>>>  include/linux/kexec.h      |  2 --
>>>>  kernel/crash_core.c        | 87 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>  kernel/kexec_core.c        | 17 ---------
>>>>  5 files changed, 96 insertions(+), 75 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>> index cedfe20..5f38942 100644
>>>> --- a/arch/x86/kernel/setup.c
>>>> +++ b/arch/x86/kernel/setup.c
>>>> @@ -486,59 +486,6 @@ static void __init memblock_x86_reserve_range_setup_data(void)
>>>>  # define CRASH_ADDR_HIGH_MAX	SZ_64T
>>>>  #endif
>>>>  
>>>> -static int __init reserve_crashkernel_low(void)
>>>> -{
>>>> -#ifdef CONFIG_X86_64
>>>> -	unsigned long long base, low_base = 0, low_size = 0;
>>>> -	unsigned long total_low_mem;
>>>> -	int ret;
>>>> -
>>>> -	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>> -
>>>> -	/* crashkernel=Y,low */
>>>> -	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size, &base);
>>>> -	if (ret) {
>>>> -		/*
>>>> -		 * two parts from kernel/dma/swiotlb.c:
>>>> -		 * -swiotlb size: user-specified with swiotlb= or default.
>>>> -		 *
>>>> -		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>> -		 * to 8M for other buffers that may need to stay low too. Also
>>>> -		 * make sure we allocate enough extra low memory so that we
>>>> -		 * don't run out of DMA buffers for 32-bit devices.
>>>> -		 */
>>>> -		low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>>>> -	} else {
>>>> -		/* passed with crashkernel=0,low ? */
>>>> -		if (!low_size)
>>>> -			return 0;
>>>> -	}
>>>> -
>>>> -	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> -	if (!low_base) {
>>>> -		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>> -		       (unsigned long)(low_size >> 20));
>>>> -		return -ENOMEM;
>>>> -	}
>>>> -
>>>> -	ret = memblock_reserve(low_base, low_size);
>>>> -	if (ret) {
>>>> -		pr_err("%s: Error reserving crashkernel low memblock.\n", __func__);
>>>> -		return ret;
>>>> -	}
>>>> -
>>>> -	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>> -		(unsigned long)(low_size >> 20),
>>>> -		(unsigned long)(low_base >> 20),
>>>> -		(unsigned long)(total_low_mem >> 20));
>>>> -
>>>> -	crashk_low_res.start = low_base;
>>>> -	crashk_low_res.end   = low_base + low_size - 1;
>>>> -	insert_resource(&iomem_resource, &crashk_low_res);
>>>> -#endif
>>>> -	return 0;
>>>> -}
>>>> -
>>>>  static void __init reserve_crashkernel(void)
>>>>  {
>>>>  	unsigned long long crash_size, crash_base, total_mem;
>>>> @@ -602,9 +549,12 @@ static void __init reserve_crashkernel(void)
>>>>  		return;
>>>>  	}
>>>>  
>>>> -	if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>>> -		memblock_free(crash_base, crash_size);
>>>> -		return;
>>>> +	if (crash_base >= (1ULL << 32)) {
>>>> +		if (reserve_crashkernel_low()) {
>>>> +			memblock_free(crash_base, crash_size);
>>>> +			return;
>>>> +		}
>>>> +		insert_resource(&iomem_resource, &crashk_low_res);
>>>
>>> Some specific reason to move insert_resouce out of the
>>> reserve_crashkernel_low function?
>>
>> No specific reason.
>> I just exposed arm64 "Crash kernel low" in request_standard_resources() as other resources,
>> so did this change.
> 
> Ok.
> 
>>
>>>
>>>>  	}
>>>>  
>>>>  	pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
>>>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>>>> index 525510a..4df8c0b 100644
>>>> --- a/include/linux/crash_core.h
>>>> +++ b/include/linux/crash_core.h
>>>> @@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>>>>  extern unsigned char *vmcoreinfo_data;
>>>>  extern size_t vmcoreinfo_size;
>>>>  extern u32 *vmcoreinfo_note;
>>>> +extern struct resource crashk_res;
>>>> +extern struct resource crashk_low_res;
>>>>  
>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>  			  void *data, size_t data_len);
>>>> @@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>>  int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>>>>  		unsigned long long *crash_size, unsigned long long *crash_base);
>>>> +int __init reserve_crashkernel_low(void);
>>>>  
>>>>  #endif /* LINUX_CRASH_CORE_H */
>>>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>>>> index 1776eb2..5d5d963 100644
>>>> --- a/include/linux/kexec.h
>>>> +++ b/include/linux/kexec.h
>>>> @@ -330,8 +330,6 @@ extern int kexec_load_disabled;
>>>>  
>>>>  /* Location of a reserved region to hold the crash kernel.
>>>>   */
>>>> -extern struct resource crashk_res;
>>>> -extern struct resource crashk_low_res;
>>>>  extern note_buf_t __percpu *crash_notes;
>>>>  
>>>>  /* flag to track if kexec reboot is in progress */
>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>> index 9f1557b..eb72fd6 100644
>>>> --- a/kernel/crash_core.c
>>>> +++ b/kernel/crash_core.c
>>>> @@ -7,6 +7,8 @@
>>>>  #include <linux/crash_core.h>
>>>>  #include <linux/utsname.h>
>>>>  #include <linux/vmalloc.h>
>>>> +#include <linux/memblock.h>
>>>> +#include <linux/swiotlb.h>
>>>>  
>>>>  #include <asm/page.h>
>>>>  #include <asm/sections.h>
>>>> @@ -19,6 +21,22 @@ u32 *vmcoreinfo_note;
>>>>  /* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
>>>>  static unsigned char *vmcoreinfo_data_safecopy;
>>>>  
>>>> +/* Location of the reserved area for the crash kernel */
>>>> +struct resource crashk_res = {
>>>> +	.name  = "Crash kernel",
>>>> +	.start = 0,
>>>> +	.end   = 0,
>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>> +};
>>>> +struct resource crashk_low_res = {
>>>> +	.name  = "Crash kernel",
>>>> +	.start = 0,
>>>> +	.end   = 0,
>>>> +	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> +	.desc  = IORES_DESC_CRASH_KERNEL
>>>> +};
>>>> +
>>>>  /*
>>>>   * parsing the "crashkernel" commandline
>>>>   *
>>>> @@ -292,6 +310,75 @@ int __init parse_crashkernel_low(char *cmdline,
>>>>  				"crashkernel=", suffix_tbl[SUFFIX_LOW]);
>>>>  }
>>>>  
>>>> +#if defined(CONFIG_X86_64)
>>>> +#define CRASH_ALIGN		SZ_16M
>>>> +#elif defined(CONFIG_ARM64)
>>>> +#define CRASH_ALIGN		SZ_2M
>>>> +#endif
>>>
>>> I think no need to have the #ifdef, although I can not think out of
>>> reason we have 16M for X86, maybe move it to 2M as well if no other
>>> objections.  Then it will be easier to reserve crashkernel successfully
>>> considering nowadays we have KASLR and other stuff it becomes harder.
>>
>> I also don't figure out why it is 16M in x86.
> 
> IMHO, if we do not know why and in theory it should work with 2M, can
> you do some basic testing and move it to 2M?
> 
> We can easily move back to 16M if someone really report something, but
> if we do not change it will always stay there but we do not know why.

Ok. I will do some test later.

> 
>>
>>>
>>>> +
>>>> +int __init reserve_crashkernel_low(void)
>>>> +{
>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>>>> +	unsigned long long base, low_base = 0, low_size = 0;
>>>> +	unsigned long total_low_mem;
>>>> +	int ret;
>>>> +
>>>> +	total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>>> +
>>>> +	/* crashkernel=Y,low */
>>>> +	ret = parse_crashkernel_low(boot_command_line, total_low_mem, &low_size,
>>>> +			&base);
>>>> +	if (ret) {
>>>> +#ifdef CONFIG_X86_64
>>>> +		/*
>>>> +		 * two parts from lib/swiotlb.c:
>>>> +		 * -swiotlb size: user-specified with swiotlb= or default.
>>>> +		 *
>>>> +		 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>>>> +		 * to 8M for other buffers that may need to stay low too. Also
>>>> +		 * make sure we allocate enough extra low memory so that we
>>>> +		 * don't run out of DMA buffers for 32-bit devices.
>>>> +		 */
>>>> +		low_size = max(swiotlb_size_or_default() + (8UL << 20),
>>>> +				256UL << 20);
>>>> +#else
>>>> +		/*
>>>> +		 * in arm64, reserve low memory if and only if crashkernel=X,low
>>>> +		 * specified.
>>>> +		 */
>>>> +		return -EINVAL;
>>>> +#endif
>>>
>>> As said before, can you explore about why it needs different logic, it
>>> would be good to keep two arches same.
>>>
>>>> +	} else {
>>>> +		/* passed with crashkernel=0,low ? */
>>>> +		if (!low_size)
>>>> +			return 0;
>>>> +	}
>>>> +
>>>> +	low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>>>> +	if (!low_base) {
>>>> +		pr_err("Cannot reserve %ldMB crashkernel low memory, please try smaller size.\n",
>>>> +		       (unsigned long)(low_size >> 20));
>>>> +		return -ENOMEM;
>>>> +	}
>>>> +
>>>> +	ret = memblock_reserve(low_base, low_size);
>>>> +	if (ret) {
>>>> +		pr_err("%s: Error reserving crashkernel low memblock.\n",
>>>> +				__func__);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System low RAM: %ldMB)\n",
>>>> +		(unsigned long)(low_size >> 20),
>>>> +		(unsigned long)(low_base >> 20),
>>>> +		(unsigned long)(total_low_mem >> 20));
>>>> +
>>>> +	crashk_low_res.start = low_base;
>>>> +	crashk_low_res.end   = low_base + low_size - 1;
>>>> +#endif
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>>>  			  void *data, size_t data_len)
>>>>  {
>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>> index 15d70a9..458d093 100644
>>>> --- a/kernel/kexec_core.c
>>>> +++ b/kernel/kexec_core.c
>>>> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>>>>  /* Flag to indicate we are going to kexec a new kernel */
>>>>  bool kexec_in_progress = false;
>>>>  
>>>> -
>>>> -/* Location of the reserved area for the crash kernel */
>>>> -struct resource crashk_res = {
>>>> -	.name  = "Crash kernel",
>>>> -	.start = 0,
>>>> -	.end   = 0,
>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>> -};
>>>> -struct resource crashk_low_res = {
>>>> -	.name  = "Crash kernel",
>>>> -	.start = 0,
>>>> -	.end   = 0,
>>>> -	.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
>>>> -	.desc  = IORES_DESC_CRASH_KERNEL
>>>> -};
>>>> -
>>>>  int kexec_should_crash(struct task_struct *p)
>>>>  {
>>>>  	/*
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>> Thanks
>>> Dave
>>>
>>>
>>> .
>>>
>> Thanks,
>> Chen Zhou
>>
> 
> Thanks
> Dave
> 
> 

Thanks,
Chen Zhou

Powered by blists - more mailing lists