linux-kernel - Re: [PATCH v7 2/8] x86/crash: Introduce new options to support cpu and memory hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f384f515-cab6-042d-5ba8-cef5d615b991@oracle.com>
Date:   Thu, 5 May 2022 11:31:05 -0500
From:   Eric DeVolder <eric.devolder@...cle.com>
To:     Sourabh Jain <sourabhjain@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        kexec@...ts.infradead.org, ebiederm@...ssion.com,
        dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com,
        nramas@...ux.microsoft.com, thomas.lendacky@....com,
        robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
        konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v7 2/8] x86/crash: Introduce new options to support cpu
 and memory hotplug



On 4/29/22 01:41, Sourabh Jain wrote:
> 
> On 26/04/22 20:09, Eric DeVolder wrote:
>>
>>
>> On 4/25/22 23:21, Sourabh Jain wrote:
>>>
>>> On 13/04/22 22:12, Eric DeVolder wrote:
>>>> CRASH_HOTPLUG is to enable cpu and memory hotplug support of crash.
>>>>
>>>> CRASH_HOTPLUG_ELFCOREHDR_SZ is used to specify the maximum size of
>>>> the elfcorehdr buffer/segment.
>>>>
>>>> This is a preparation for later usage.
>>>>
>>>> Signed-off-by: Eric DeVolder <eric.devolder@...cle.com>
>>>> Acked-by: Baoquan He <bhe@...hat.com>
>>>> ---
>>>>   arch/x86/Kconfig | 26 ++++++++++++++++++++++++++
>>>>   1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index b0142e01002e..f7b92ee1bcc7 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -2072,6 +2072,32 @@ config CRASH_DUMP
>>>>         (CONFIG_RELOCATABLE=y).
>>>>         For more details see Documentation/admin-guide/kdump/kdump.rst
>>>> +config CRASH_HOTPLUG
>>>> +    bool "kernel updates of crash elfcorehdr"
>>>> +    depends on CRASH_DUMP && (HOTPLUG_CPU || MEMORY_HOTPLUG) && KEXEC_FILE
>>>> +    help
>>>> +      Enable the kernel to update the crash elfcorehdr (which contains
>>>> +      the list of CPUs and memory regions) directly when hot plug/unplug
>>>> +      of CPUs or memory. Otherwise userspace must monitor these hot
>>>> +      plug/unplug change notifications via udev in order to
>>>> +      unload-then-reload the crash kernel so that the list of CPUs and
>>>> +      memory regions is kept up-to-date. Note that the udev CPU and
>>>> +      memory change notifications still occur (however, userspace is not
>>>> +      required to monitor for crash dump purposes).
>>>> +
>>>> +config CRASH_HOTPLUG_ELFCOREHDR_SZ
>>>> +    depends on CRASH_HOTPLUG
>>>> +    int
>>>> +    default 131072
>>>> +    help
>>>> +      Specify the maximum size of the elfcorehdr buffer/segment.
>>>> +      The 128KiB default is sized so that it can accommodate 2048
>>>> +      Elf64_Phdr, where each Phdr represents either a CPU or a
>>>> +      region of memory.
>>>> +      For example, this size can accommodate a machine with up to 1024
>>>> +      CPUs and up to 1024 memory regions, eg. as represented by the
>>>> +      'System RAM' entries in /proc/iomem.
>>>
>>> Is it possible to get rid of CRASH_HOTPLUG_ELFCOREHDR_SZ?
>> At the moment, I do not think so. The idea behind this value is to represent the largest number of 
>> CPUs and memory regions possible in the system. Today there is NR_CPUS which could be used for 
>> CPUs, but there isn't a similar value for memory. I also am not aware of a kernel variable that 
>> could be utilized to represent the maximum number of memory regions. If there is, please let me know!
>>>
>>> How about finding the additional buffer space needed for future CPU and memory
>>> add during the kdump load? Not sure about the feasibility of doing this in
>>> kexec tool (userspace).
>>
>> I may not understand what you are asking, but the x86 code, for kexec_file_load, does in fact 
>> allocate all the space needed (currently via CRASH_HOTPLUG_ELFCOREHDR_SZ) upon kdump load.
>>
>> For kexec_load, I've had no problem asking the kexec tool to allocate a larger piece of memory for 
>> the elfcorehdr. But it is the same problem as CRASH_HOTPLUG_ELFCOREHDR_SZ; how big? In my 
>> workspace I tell kexec tool how big. If there are sysfs visible values for NR_CPU and memory, then 
>> we could have kexec pull those and compute.
> 
> Yeah dynamic calculation for PT_LOAD sections needed for possible memory may not be straightforward. 
> But still I did not get the rational for limiting the possible PT_LOAD sections or memory ranges to 
> only 1024. Although in kexec tool the max memory ranges for x86 is 32K.
> 
> commit 1bc7bc7649fa29d95c98f6a6d8dd2f08734a865c
> Author: David Hildenbrand <david@...hat.com>
> Date:   Tue Mar 23 11:01:10 2021 +0100
> 
>      crashdump/x86: increase CRASH_MAX_MEMORY_RANGES to 32k
> 
>      virtio-mem in Linux adds/removes individual memory blocks (e.g., 128 MB
>      each). Linux merges adjacent memory blocks added by virtio-mem devices, but
>      we can still end up with a very sparse memory layout when unplugging
>      memory in corner cases.
> 
>      Let's increase the maximum number of crash memory ranges from ~2k to 32k.
>      32k should be sufficient for a very long time.
> 
>      e_phnum field in the header is 16 bits wide, so we can fit a maximum of
>      ~64k entries in there, shared with other entries (i.e., CPU). Therefore,
>      using up to 32k memory ranges is fine. (if we ever need more than ~64k,
> 
> Do you see any issue if we increase the memory range count to 32K?

No, I do not. Allowing for 32K ranges means the elfcorehdr buffer is now 2MiB.
I'm thinking I'll redefine/rename this config option to mirror CRASH_MAX_MEMORY_RANGES
and default it to 32K. Then the buffer math will take into account NR_CPUS and this
value to compute the buffer size.

Thanks!
eric

> 
> Thanks,
> Sourabh Jain
>