[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99bb0514-61a2-db5f-5cc8-bac5e4283d19@oracle.com>
Date: Fri, 10 Feb 2023 10:51:14 -0600
From: Eric DeVolder <eric.devolder@...cle.com>
To: Sourabh Jain <sourabhjain@...ux.ibm.com>,
linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com,
nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v18 3/7] crash: add generic infrastructure for crash
hotplug support
On 2/9/23 13:10, Sourabh Jain wrote:
> Hello Eric,
>
> On 01/02/23 04:12, Eric DeVolder wrote:
>> To support crash hotplug, a mechanism is needed to update the crash
>> elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
>> onlining).
>>
>> To track CPU changes, callbacks are registered with the cpuhp
>> mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
>> crash hotplug elfcorehdr update has no explicit ordering requirement
>> (relative to other cpuhp states), so meets the criteria for
>> utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
>> state and avoids the need to introduce a new state for crash
>> hotplug. Also, this is the last state in the PREPARE group, just
>> prior to the STARTING group, which is very close to the CPU
>> starting up in an plug/online situation, or stopping in a unplug/
>> offline situation. This minimizes the window of time during an
>> actual plug/online or unplug/offline situation in which the
>> elfcorehdr would be inaccurate.
>>
>> Note, that when a CPU is being unplugged/offlined, the CPU is still
>> in the foreach_present_cpu() during the regeneration of the
>> elfcorehdr. Thus there is a need to explicitly check and exclude
>> the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
>> cpu from elfcorehdr notes'.
>>
>> To track memory changes, a notifier is registered to capture the
>> memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>
>> The cpu callbacks and memory notifiers invoke handle_hotplug_event()
>> which performs needed tasks and then dispatches the event to the
>> architecture specific arch_crash_handle_hotplug_event() to update the
>> elfcorehdr with the current state of CPUs and memory. During the
>> process, the kexec_lock is held.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@...cle.com>
>> Acked-by: Baoquan He <bhe@...hat.com>
>> ---
>> include/linux/crash_core.h | 9 +++
>> include/linux/kexec.h | 12 ++++
>> kernel/crash_core.c | 139 +++++++++++++++++++++++++++++++++++++
>> 3 files changed, 160 insertions(+)
>>
>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>> index de62a722431e..ed868d237c07 100644
>> --- a/include/linux/crash_core.h
>> +++ b/include/linux/crash_core.h
>> @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>> unsigned long long *crash_size, unsigned long long *crash_base);
>> +#define KEXEC_CRASH_HP_NONE 0
>> +#define KEXEC_CRASH_HP_REMOVE_CPU 1
>> +#define KEXEC_CRASH_HP_ADD_CPU 2
>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 3
>> +#define KEXEC_CRASH_HP_ADD_MEMORY 4
>> +#define KEXEC_CRASH_HP_INVALID_CPU -1U
>> +
>> +struct kimage;
>> +
>> #endif /* LINUX_CRASH_CORE_H */
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 27ef420c7a45..a52624ae4452 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
>> #include <linux/compat.h>
>> #include <linux/ioport.h>
>> #include <linux/module.h>
>> +#include <linux/highmem.h>
>> #include <asm/kexec.h>
>> /* Verify architecture specific macros are defined */
>> @@ -371,6 +372,13 @@ struct kimage {
>> struct purgatory_info purgatory_info;
>> #endif
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + int hp_action;
>> + unsigned int offlinecpu;
>> + bool elfcorehdr_index_valid;
>> + int elfcorehdr_index;
>
> May be I am reiterating myself but I think we can manage without elfcorehdr_index_valid.
>
> Here is how:
> Initialize the elfcorehdr_index with a negative value in do_kimage_alloc_init
> function (it is called for both kexec_load and kexec_file_load).
>
> Now when the control reaches to handle_hotplug_event function and if elfcorehdr_index
> has negative value find the correct index and re-initialize the elfcorehdr_index.
>
> Thoughts?
>
> Thanks,
> Sourabh Jain
>
ok, I'll eliminate elfcorehdr_index_valid.
eric
Powered by blists - more mailing lists