linux-kernel - Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <08C19FFB-C6FC-4BB7-A1C2-67CE6B99D2AB@oracle.com>
Date:   Thu, 16 Jan 2020 09:47:55 -0600
From:   John Donnelly <john.p.donnelly@...cle.com>
To:     James Morse <james.morse@....com>
Cc:     Dave Young <dyoung@...hat.com>, Chen Zhou <chenzhou10@...wei.com>,
        kbuild test robot <lkp@...el.com>, horms@...ge.net.au,
        linux-doc@...r.kernel.org, catalin.marinas@....com,
        bhsharma@...hat.com, xiexiuqi@...wei.com,
        kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
        mingo@...hat.com, tglx@...utronix.de, will@...nel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v7 1/4] x86: kdump: move reserve_crashkernel_low() into
 crash_core.c



> On Jan 16, 2020, at 9:17 AM, James Morse <james.morse@....com> wrote:
> 
> Hi guys,
> 
> On 28/12/2019 09:32, Dave Young wrote:
>> On 12/27/19 at 07:04pm, Chen Zhou wrote:
>>> On 2019/12/27 13:54, Dave Young wrote:
>>>> On 12/23/19 at 11:23pm, Chen Zhou wrote:
>>>>> In preparation for supporting reserve_crashkernel_low in arm64 as
>>>>> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
>>>>> 
>>>>> Note, in arm64, we reserve low memory if and only if crashkernel=X,low
>>>>> is specified. Different with x86_64, don't set low memory automatically.
>>>> 
>>>> Do you have any reason for the difference?  I'd expect we have same
>>>> logic if possible and remove some of the ifdefs.
>>> 
>>> In x86_64, if we reserve crashkernel above 4G, then we call reserve_crashkernel_low()
>>> to reserve low memory.
>>> 
>>> In arm64, to simplify, we call reserve_crashkernel_low() at the beginning of reserve_crashkernel()
>>> and then relax the arm64_dma32_phys_limit if reserve_crashkernel_low() allocated something.
>>> In this case, if reserve crashkernel below 4G there will be 256M low memory set automatically
>>> and this needs extra considerations.
> 
>> Sorry that I did not read the old thread details and thought that is
>> arch dependent.  But rethink about that, it would be better that we can
>> have same semantic about crashkernel parameters across arches.  If we
>> make them different then it causes confusion, especially for
>> distributions.
> 
> Surely distros also want one crashkernel* string they can use on all platforms without
> having to detect the kernel version, platform or changeable memory layout...
> 
> 
>> OTOH, I thought if we reserve high memory then the low memory should be
>> needed.  There might be some exceptions, but I do not know the exact
>> one,
> 
>> can we make the behavior same, and special case those systems which
>> do not need low memory reservation.
> 
> Its tricky to work out which systems are the 'normal' ones.
> 
> We don't have a fixed memory layout for arm64. Some systems have no memory below 4G.
> Others have no memory above 4G.
> 
> Chen Zhou's machine has some memory below 4G, but its too precious to reserve a large
> chunk for kdump. Without any memory below 4G some of the drivers won't work.
> 
> I don't see what distros can set as their default for all platforms if high/low are
> mutually exclusive with the 'crashkernel=' in use today. How did x86 navigate this, ... or
> was it so long ago?
> 
> No one else has reported a problem with the existing placement logic, hence treating this
> 'low' thing as the 'in addition' special case.


Hi,

I am seeing similar  Arm crash dump issues  on  5.4 kernels  where we need  rather large amount of crashkernel memory reserved that is not available below 4GB ( The maximum reserved size appears to be around 768M ) . When I pick memory range higher than 4GB , I see  adapters that fail to initialize :


There is no low-memory  <4G  memory for DMA ;     

[   11.506792] kworker/0:14: page allocation failure: order:0, 
mode:0x104(GFP_DMA32|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0 
[   11.518793] CPU: 0 PID: 150 Comm: kworker/0:14 Not tainted 
5.4.0-1948.3.el8uek.aarch64 #1 
[   11.526955] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 
0ACKL025 01/18/2019 
[   11.534948] Workqueue: events work_for_cpu_fn 
[   11.539291] Call trace: 
[   11.541727]  dump_backtrace+0x0/0x18c 
[   11.545376]  show_stack+0x24/0x30 
[   11.548679]  dump_stack+0xbc/0xe0 
[   11.551982]  warn_alloc+0xf0/0x15c 
[   11.555370]  __alloc_pages_slowpath+0xb4c/0xb84 
[   11.559887]  __alloc_pages_nodemask+0x2d0/0x330 
[   11.564405]  alloc_pages_current+0x8c/0xf8 
[   11.568496]  ttm_bo_device_init+0x188/0x220 [ttm] 
[   11.573187]  drm_vram_mm_init+0x58/0x80 [drm_vram_helper] 
[   11.578572]  drm_vram_helper_alloc_mm+0x64/0xb0 [drm_vram_helper] 
[   11.584655]  ast_mm_init+0x38/0x80 [ast] 
[   11.588566]  ast_driver_load+0x474/0xa70 [ast] 
[   11.593029]  drm_dev_register+0x144/0x1c8 [drm] 
[   11.597573]  drm_get_pci_dev+0xa4/0x168 [drm] 
[   11.601919]  ast_pci_probe+0x8c/0x9c [ast] 
[   11.606004]  local_pci_probe+0x44/0x98 
[   11.609739]  work_for_cpu_fn+0x20/0x30 
[   11.613474]  process_one_work+0x1c4/0x41c 
[   11.617470]  worker_thread+0x150/0x4b0 
[   11.621206]  kthread+0x110/0x114 
[   11.624422]  ret_from_fork+0x10/0x18 

This failure is related to a graphics adapter. 

The more complex kdump configurations that use networking stack to NFS mount a filesystem to dump to , or use ssh to copy to another machine,  require more crashkernel memory reservations than perhaps the “default*” settings of  a minimal kdump that creates a minimal  vmcore to local storage in  /var/crash. If crashkernel is too small I get Out of Memory issues and the entire vmcore  process fails. 

( *default kdump setting I assume are a minimal vmcore to /var/crash using primary boot device where /root is located  ) 




> 
> 
>>> previous discusses:
>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_5_670&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=gS9BLOkmj78lP5L7SP6_VLHwvP249uWKaE2R7N7sxgM&e= 
>>> 	https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2019_6_13_229&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=U1Nis29n3A7XSBzED53fiE4MDAv5NlxYp1UorvvBOOw&e= 
>> 
>> Another concern from James:
>> "
>> With both crashk_low_res and crashk_res, we end up with two entries in /proc/iomem called
>> "Crash kernel". Because its sorted by address, and kexec-tools stops searching when it
>> find "Crash kernel", you are always going to get the kernel placed in the lower portion.
>> "
>> 
>> The kexec-tools code is iterating all "Crash kernel" ranges and add them
>> in an array.  In X86 code, it uses the higher range to locate memory.
> 
> Then my hurried reading of what the user-space code does was wrong!
> 
> If kexec-tools places the kernel in the low region, there may not be enough memory left
> for whatever purpose it was reserved for. This was the motivation for giving it a
> different name.
> 
> 
> Thanks,
> 
> James
> 
> _______________________________________________
> kexec mailing list
> kexec@...ts.infradead.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_kexec&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=t2fPg9D87F7D8jm0_3CG9yoiIKdRg4qc_thBw4bzMhc&m=jOAu1DTDpohsWszalfTCYx46eGF19TSWVLchN5yBPgk&s=bqp02iQDP_Ez-XvLIvj-IPHqbbZwMPlDgmEcG8vhXFE&e=