[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bffb3d9e-1946-f4b6-d58c-9c44bc0bee26@oracle.com>
Date: Thu, 26 May 2022 08:16:16 -0500
From: Eric DeVolder <eric.devolder@...cle.com>
To: Sourabh Jain <sourabhjain@...ux.ibm.com>,
linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com,
nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v8 0/7] crash: Kernel handling of CPU and memory hot
un/plug
On 5/25/22 10:13, Sourabh Jain wrote:
> Hello Eric,
>
> On 06/05/22 00:15, Eric DeVolder wrote:
>> When the kdump service is loaded, if a CPU or memory is hot
>> un/plugged, the crash elfcorehdr (for x86), which describes the CPUs
>> and memory in the system, must also be updated, else the resulting
>> vmcore is inaccurate (eg. missing either CPU context or memory
>> regions).
>>
>> The current solution utilizes udev to initiate an unload-then-reload
>> of the kdump image (e. kernel, initrd, boot_params, puratory and
>> elfcorehdr) by the userspace kexec utility. In previous posts I have
>> outlined the significant performance problems related to offloading
>> this activity to userspace.
>>
>> This patchset introduces a generic crash hot un/plug handler that
>> registers with the CPU and memory notifiers. Upon CPU or memory
>> changes, this generic handler is invoked and performs important
>> housekeeping, for example obtaining the appropriate lock, and then
>> invokes an architecture specific handler to do the appropriate
>> updates.
>>
>> In the case of x86_64, the arch specific handler generates a new
>> elfcorehdr, and overwrites the old one in memory. No involvement
>> with userspace needed.
>>
>> To realize the benefits/test this patchset, one must make a couple
>> of minor changes to userspace:
>>
>> - Disable the udev rule for updating kdump on hot un/plug changes.
>> Add the following as the first two lines to the udev rule file
>> /usr/lib/udev/rules.d/98-kexec.rules:
>
> If we can have a sysfs attribute to advertise this feature then userspace
> utilities (kexec tool/udev rules) can take action accordingly. In short, it will
> help us maintain backward compatibility.
>
> kexec tool can use the new sysfs attribute and allocate additional buffer space
> for elfcorehdr accordingly. Similarly, the checksum-related changes can come
> under this check.
>
> Udev rule can use this sysfs file to decide kdump service reload is required or not.
Great idea. I've been working on the corresponding udev and kexec-tools changes and your input/idea
here is quite timely.
I have boolean "crash_hotplug" as a core_param(), so it will show up as:
# cat /sys/module/kernel/parameters/crash_hotplug
N
This will provide userspace the indication it needs.
>
> Thanks,
> Sourabh Jain
>
Powered by blists - more mailing lists