[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b635c9a3-35aa-55ea-6dcf-ff3afddbb65e@oracle.com>
Date: Tue, 9 May 2023 20:35:53 +0000
From: Eric DeVolder <eric.devolder@...cle.com>
To: Sourabh Jain <sourabhjain@...ux.ibm.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
"ebiederm@...ssion.com" <ebiederm@...ssion.com>,
"dyoung@...hat.com" <dyoung@...hat.com>,
"bhe@...hat.com" <bhe@...hat.com>,
"vgoyal@...hat.com" <vgoyal@...hat.com>
CC: "tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"hpa@...or.com" <hpa@...or.com>,
"nramas@...ux.microsoft.com" <nramas@...ux.microsoft.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"robh@...nel.org" <robh@...nel.org>,
"efault@....de" <efault@....de>,
"rppt@...nel.org" <rppt@...nel.org>,
"david@...hat.com" <david@...hat.com>,
Konrad Wilk <konrad.wilk@...cle.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
On 5/9/23 01:56, Sourabh Jain wrote:
>
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>> - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>> safe for the kernel to modify the elfcorehdr (because kexec-tools
>> has excluded the elfcorehdr from the digest).
>> - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>> kexec-tools what the preferred size of the elfcorehdr memory buffer
>> should be in order to accommodate hotplug changes.
>> - The sysfs crash_hotplug nodes (ie.
>> /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>> that they examine kexec_file_load() vs kexec_load(), and when
>> kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>> This is critical so that the udev rule processing of crash_hotplug
>> indicates correctly (ie. the userspace unload-then-load of the
>> kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>> - For systems which have these kernel changes in place, but not the
>> corresponding changes to the crash hot plug udev rules and
>> kexec-tools, (ie "older" systems) those systems will continue to
>> unload-then-load the kdump image, as has always been done. The
>> kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>> - For systems which have these kernel changes in place and the proposed
>> udev rule changes in place, but not the kexec-tools changes in place:
>> - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>> so the unload-then-reload of kdump image will occur (the sysfs
>> crash_hotplug nodes will show 0).
>> - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>> to show 1, and the kernel will modify the elfcorehdr directly. And
>> with the udev changes in place, the unload-then-load will not occur!
>> - For systems which have these kernel changes as well as the udev and
>> kexec-tools changes in place, then the user/admin has full authority
>> over the enablement and support of crash hotplug support, whether via
>> kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@...ux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@...cle.com>
>> ---
>> arch/x86/include/asm/kexec.h | 11 +++++++----
>> arch/x86/kernel/crash.c | 27 +++++++++++++++++++++++++++
>> include/linux/kexec.h | 14 ++++++++++++--
>> include/uapi/linux/kexec.h | 1 +
>> kernel/crash_core.c | 31 +++++++++++++++++++++++++++++++
>> kernel/kexec.c | 3 +++
>> kernel/ksysfs.c | 15 +++++++++++++++
>> 7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>> #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>> #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>> #endif
>> #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>> #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>> #endif
>> #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>> #undef pr_fmt
>> #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> + return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> + return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> + unsigned int sz;
>> +
>> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> + sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> + else
>> + sz += 2 + CONFIG_NR_CPUS_DEFAULT;
>
> If the sz holds the garbage value we have issues in else part. Or if you expecting
> sz to be 0 then += is not needed in the else part.
>
> How to doing this way?
>
> unsigned int sz;
>
> sz = 2 + CONFIG_NR_CPUS_DEFAULT;
>
> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> sz += CRASH_MAX_MEMORY_RANGES
>
>
> Thanks,
> Sourabh Jain
>
Thanks for catching this mistake, sz is to be initialized to zero.
eric
>> + sz *= sizeof(Elf64_Phdr);
>> + return sz;
>> +}
>> +
>> /**
>> * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>> * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>> unsigned int preserve_context : 1;
>> /* If set, we are using file mode kexec syscall */
>> unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> + unsigned int update_elfcorehdr:1;
>> +#endif
>> #ifdef ARCH_HAS_KIMAGE_ARCH
>> struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>> /* List of defined/legal kexec flags */
>> #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>> #else
>> -#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>> #endif
>> /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>> static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>> #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>> #ifndef crash_hotplug_cpu_support
>> static inline int crash_hotplug_cpu_support(void) { return 0; }
>> #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>> static inline int crash_hotplug_memory_support(void) { return 0; }
>> #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>> #else /* !CONFIG_KEXEC_CORE */
>> struct pt_regs;
>> struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>> /* kexec flags for different usage scenarios */
>> #define KEXEC_ON_CRASH 0x00000001
>> #define KEXEC_PRESERVE_CONTEXT 0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR 0x00000004
>> #define KEXEC_ARCH_MASK 0xffff0000
>> /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>> #ifdef CONFIG_CRASH_HOTPLUG
>> #undef pr_fmt
>> #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> + int rc = 0;
>> +
>> + /* Obtain lock while reading crash information */
>> + if (!kexec_trylock()) {
>> + pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> + return 0;
>> + }
>> + if (kexec_crash_image) {
>> + if (kexec_crash_image->file_mode)
>> + rc = 1;
>> + else
>> + rc = kexec_crash_image->update_elfcorehdr;
>> + }
>> + /* Release lock now that update complete */
>> + kexec_unlock();
>> +
>> + return rc;
>> +}
>> +
>> /*
>> * To accurately reflect hot un/plug changes of cpu and memory resources
>> * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>> image = kexec_crash_image;
>> + /* Check that updating elfcorehdr is permitted */
>> + if (!(image->file_mode || image->update_elfcorehdr))
>> + goto out;
>> +
>> if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>> hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>> pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>> if (flags & KEXEC_PRESERVE_CONTEXT)
>> image->preserve_context = 1;
>> + if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> + image->update_elfcorehdr = 1;
>> +
>> ret = machine_kexec_prepare(image);
>> if (ret)
>> goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>> }
>> KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> + struct kobj_attribute *attr, char *buf)
>> +{
>> + unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> + return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>> #endif /* CONFIG_CRASH_CORE */
>> /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>> #endif
>> #ifdef CONFIG_CRASH_CORE
>> &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + &crash_elfcorehdr_size_attr.attr,
>> +#endif
>> #endif
>> #ifndef CONFIG_TINY_RCU
>> &rcu_expedited_attr.attr,
Powered by blists - more mailing lists