[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b635c9a3-35aa-55ea-6dcf-ff3afddbb65e@oracle.com>
Date:   Tue, 9 May 2023 20:35:53 +0000
From:   Eric DeVolder <eric.devolder@...cle.com>
To:     Sourabh Jain <sourabhjain@...ux.ibm.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
        "ebiederm@...ssion.com" <ebiederm@...ssion.com>,
        "dyoung@...hat.com" <dyoung@...hat.com>,
        "bhe@...hat.com" <bhe@...hat.com>,
        "vgoyal@...hat.com" <vgoyal@...hat.com>
CC:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "nramas@...ux.microsoft.com" <nramas@...ux.microsoft.com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "robh@...nel.org" <robh@...nel.org>,
        "efault@....de" <efault@....de>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "david@...hat.com" <david@...hat.com>,
        Konrad Wilk <konrad.wilk@...cle.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH v22 6/8] crash: hotplug support for kexec_load()
On 5/9/23 01:56, Sourabh Jain wrote:
> 
> On 04/05/23 04:11, Eric DeVolder wrote:
>> The hotplug support for kexec_load() requires coordination with
>> userspace, and therefore a little extra help from the kernel to
>> facilitate the coordination.
>>
>> In the absence of the solution contained within this particular
>> patch, if a kdump capture kernel is loaded via kexec_load() syscall,
>> then the crash hotplug logic would find the segment containing the
>> elfcorehdr, and upon a hotplug event, rewrite the elfcorehdr. While
>> generally speaking that is the desired behavior and outcome, a
>> problem arises from the fact that if the kdump image includes a
>> purgatory that performs a digest checksum, then that check would
>> fail (because the elfcorehdr was changed), and the capture kernel
>> would fail to boot and no kdump occur.
>>
>> Therefore, what is needed is for the userspace kexec-tools to
>> indicate to the kernel whether or not the supplied kdump image/
>> elfcorehdr can be modified (because the kexec-tools excludes the
>> elfcorehdr from the digest, and sizes the elfcorehdr memory buffer
>> appropriately).
>>
>> To solve these problems, this patch introduces:
>>   - a new kexec flag KEXEC_UPATE_ELFCOREHDR to indicate that it is
>>     safe for the kernel to modify the elfcorehdr (because kexec-tools
>>     has excluded the elfcorehdr from the digest).
>>   - the /sys/kernel/crash_elfcorehdr_size node to communicate to
>>     kexec-tools what the preferred size of the elfcorehdr memory buffer
>>     should be in order to accommodate hotplug changes.
>>   - The sysfs crash_hotplug nodes (ie.
>>     /sys/devices/system/[cpu|memory]/crash_hotplug) are now dynamic in
>>     that they examine kexec_file_load() vs kexec_load(), and when
>>     kexec_load(), whether or not KEXEC_UPDATE_ELFCOREHDR is in effect.
>>     This is critical so that the udev rule processing of crash_hotplug
>>     indicates correctly (ie. the userspace unload-then-load of the
>>     kdump of the kdump image can be skipped, or not).
>>
>> With this patch in place, I believe the following statements to be true
>> (with local testing to verify):
>>
>>   - For systems which have these kernel changes in place, but not the
>>     corresponding changes to the crash hot plug udev rules and
>>     kexec-tools, (ie "older" systems) those systems will continue to
>>     unload-then-load the kdump image, as has always been done. The
>>     kexec-tools will not set KEXEC_UPDATE_ELFCOREHDR.
>>   - For systems which have these kernel changes in place and the proposed
>>     udev rule changes in place, but not the kexec-tools changes in place:
>>      - the use of kexec_load() will not set KEXEC_UPDATE_ELFCOREHDR and
>>        so the unload-then-reload of kdump image will occur (the sysfs
>>        crash_hotplug nodes will show 0).
>>      - the use of kexec_file_load() will permit sysfs crash_hotplug nodes
>>        to show 1, and the kernel will modify the elfcorehdr directly. And
>>        with the udev changes in place, the unload-then-load will not occur!
>>   - For systems which have these kernel changes as well as the udev and
>>     kexec-tools changes in place, then the user/admin has full authority
>>     over the enablement and support of crash hotplug support, whether via
>>     kexec_file_load() or kexec_load().
>>
>> Said differently, as kexec_load() was/is widely in use, these changes
>> permit it to continue to be used as-is (retaining the current unload-then-
>> reload behavior) until such time as the udev and kexec-tools changes can
>> be rolled out as well.
>>
>> I've intentionally kept the changes related to userspace coordination
>> for kexec_load() separate as this need was identified late; the
>> rest of this series has been generally reviewed and accepted. Once
>> this support has been vetted, I can refactor if needed.
>>
>> Suggested-by: Hari Bathini <hbathini@...ux.ibm.com>
>> Signed-off-by: Eric DeVolder <eric.devolder@...cle.com>
>> ---
>>   arch/x86/include/asm/kexec.h | 11 +++++++----
>>   arch/x86/kernel/crash.c      | 27 +++++++++++++++++++++++++++
>>   include/linux/kexec.h        | 14 ++++++++++++--
>>   include/uapi/linux/kexec.h   |  1 +
>>   kernel/crash_core.c          | 31 +++++++++++++++++++++++++++++++
>>   kernel/kexec.c               |  3 +++
>>   kernel/ksysfs.c              | 15 +++++++++++++++
>>   7 files changed, 96 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 9143100ea3ea..3be6a98751f0 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -214,14 +214,17 @@ void arch_crash_handle_hotplug_event(struct kimage *image);
>>   #define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
>>   #ifdef CONFIG_HOTPLUG_CPU
>> -static inline int crash_hotplug_cpu_support(void) { return 1; }
>> -#define crash_hotplug_cpu_support crash_hotplug_cpu_support
>> +int arch_crash_hotplug_cpu_support(void);
>> +#define crash_hotplug_cpu_support arch_crash_hotplug_cpu_support
>>   #endif
>>   #ifdef CONFIG_MEMORY_HOTPLUG
>> -static inline int crash_hotplug_memory_support(void) { return 1; }
>> -#define crash_hotplug_memory_support crash_hotplug_memory_support
>> +int arch_crash_hotplug_memory_support(void);
>> +#define crash_hotplug_memory_support arch_crash_hotplug_memory_support
>>   #endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void);
>> +#define crash_get_elfcorehdr_size arch_crash_get_elfcorehdr_size
>>   #endif
>>   #endif /* __ASSEMBLY__ */
>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>> index 0c9d496cf7ce..8064e65de6c0 100644
>> --- a/arch/x86/kernel/crash.c
>> +++ b/arch/x86/kernel/crash.c
>> @@ -442,6 +442,33 @@ int crash_load_segments(struct kimage *image)
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +/* These functions provide the value for the sysfs crash_hotplug nodes */
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +int arch_crash_hotplug_cpu_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +int arch_crash_hotplug_memory_support(void)
>> +{
>> +    return crash_check_update_elfcorehdr();
>> +}
>> +#endif
>> +
>> +unsigned int arch_crash_get_elfcorehdr_size(void)
>> +{
>> +    unsigned int sz;
>> +
>> +    if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>> +        sz = 2 + CONFIG_NR_CPUS_DEFAULT + CRASH_MAX_MEMORY_RANGES;
>> +    else
>> +        sz += 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> If the sz holds the garbage value we have issues in else part. Or if you expecting
> sz to be 0 then += is not needed in the else part.
> 
> How to doing this way?
> 
> unsigned int sz;
> 
> sz = 2 + CONFIG_NR_CPUS_DEFAULT;
> 
> if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
>      sz += CRASH_MAX_MEMORY_RANGES
> 
> 
> Thanks,
> Sourabh Jain
> 
Thanks for catching this mistake, sz is to be initialized to zero.
eric
>> +    sz *= sizeof(Elf64_Phdr);
>> +    return sz;
>> +}
>> +
>>   /**
>>    * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
>>    * @image: the active struct kimage
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 6a8a724ac638..050e20066cdb 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -335,6 +335,10 @@ struct kimage {
>>       unsigned int preserve_context : 1;
>>       /* If set, we are using file mode kexec syscall */
>>       unsigned int file_mode:1;
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    /* If set, allow changes to elfcorehdr of kexec_load'd image */
>> +    unsigned int update_elfcorehdr:1;
>> +#endif
>>   #ifdef ARCH_HAS_KIMAGE_ARCH
>>       struct kimage_arch arch;
>> @@ -411,9 +415,9 @@ bool kexec_load_permitted(int kexec_image_type);
>>   /* List of defined/legal kexec flags */
>>   #ifndef CONFIG_KEXEC_JUMP
>> -#define KEXEC_FLAGS    KEXEC_ON_CRASH
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_UPDATE_ELFCOREHDR)
>>   #else
>> -#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>> +#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_UPDATE_ELFCOREHDR)
>>   #endif
>>   /* List of defined/legal kexec file flags */
>> @@ -501,6 +505,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
>>   static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
>>   #endif
>> +int crash_check_update_elfcorehdr(void);
>> +
>>   #ifndef crash_hotplug_cpu_support
>>   static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   #endif
>> @@ -509,6 +515,10 @@ static inline int crash_hotplug_cpu_support(void) { return 0; }
>>   static inline int crash_hotplug_memory_support(void) { return 0; }
>>   #endif
>> +#ifndef crash_get_elfcorehdr_size
>> +static inline crash_get_elfcorehdr_size(void) { return 0; }
>> +#endif
>> +
>>   #else /* !CONFIG_KEXEC_CORE */
>>   struct pt_regs;
>>   struct task_struct;
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 981016e05cfa..01766dd839b0 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -12,6 +12,7 @@
>>   /* kexec flags for different usage scenarios */
>>   #define KEXEC_ON_CRASH        0x00000001
>>   #define KEXEC_PRESERVE_CONTEXT    0x00000002
>> +#define KEXEC_UPDATE_ELFCOREHDR    0x00000004
>>   #define KEXEC_ARCH_MASK        0xffff0000
>>   /*
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index ef6e91daad56..e05bfdb7eaed 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -704,6 +704,33 @@ subsys_initcall(crash_save_vmcoreinfo_init);
>>   #ifdef CONFIG_CRASH_HOTPLUG
>>   #undef pr_fmt
>>   #define pr_fmt(fmt) "crash hp: " fmt
>> +
>> +/*
>> + * This routine utilized when the crash_hotplug sysfs node is read.
>> + * It reflects the kernel's ability/permission to update the crash
>> + * elfcorehdr directly.
>> + */
>> +int crash_check_update_elfcorehdr(void)
>> +{
>> +    int rc = 0;
>> +
>> +    /* Obtain lock while reading crash information */
>> +    if (!kexec_trylock()) {
>> +        pr_info("kexec_trylock() failed, elfcorehdr may be inaccurate\n");
>> +        return 0;
>> +    }
>> +    if (kexec_crash_image) {
>> +        if (kexec_crash_image->file_mode)
>> +            rc = 1;
>> +        else
>> +            rc = kexec_crash_image->update_elfcorehdr;
>> +    }
>> +    /* Release lock now that update complete */
>> +    kexec_unlock();
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * To accurately reflect hot un/plug changes of cpu and memory resources
>>    * (including onling and offlining of those resources), the elfcorehdr
>> @@ -734,6 +761,10 @@ static void crash_handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
>>       image = kexec_crash_image;
>> +    /* Check that updating elfcorehdr is permitted */
>> +    if (!(image->file_mode || image->update_elfcorehdr))
>> +        goto out;
>> +
>>       if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
>>           hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
>>           pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index 92d301f98776..60de64bd14b9 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -129,6 +129,9 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
>>       if (flags & KEXEC_PRESERVE_CONTEXT)
>>           image->preserve_context = 1;
>> +    if (flags & KEXEC_UPDATE_ELFCOREHDR)
>> +        image->update_elfcorehdr = 1;
>> +
>>       ret = machine_kexec_prepare(image);
>>       if (ret)
>>           goto out;
>> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
>> index aad7a3bfd846..1d4bc493b2f4 100644
>> --- a/kernel/ksysfs.c
>> +++ b/kernel/ksysfs.c
>> @@ -165,6 +165,18 @@ static ssize_t vmcoreinfo_show(struct kobject *kobj,
>>   }
>>   KERNEL_ATTR_RO(vmcoreinfo);
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +static ssize_t crash_elfcorehdr_size_show(struct kobject *kobj,
>> +                   struct kobj_attribute *attr, char *buf)
>> +{
>> +    unsigned int sz = crash_get_elfcorehdr_size();
>> +
>> +    return sysfs_emit(buf, "%u\n", sz);
>> +}
>> +KERNEL_ATTR_RO(crash_elfcorehdr_size);
>> +
>> +#endif
>> +
>>   #endif /* CONFIG_CRASH_CORE */
>>   /* whether file capabilities are enabled */
>> @@ -255,6 +267,9 @@ static struct attribute * kernel_attrs[] = {
>>   #endif
>>   #ifdef CONFIG_CRASH_CORE
>>       &vmcoreinfo_attr.attr,
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +    &crash_elfcorehdr_size_attr.attr,
>> +#endif
>>   #endif
>>   #ifndef CONFIG_TINY_RCU
>>       &rcu_expedited_attr.attr,
Powered by blists - more mailing lists
 
