[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87956f31-472d-4091-8061-1e55fea7a3d7@redhat.com>
Date: Wed, 16 Oct 2024 17:54:56 +0200
From: David Hildenbrand <david@...hat.com>
To: Alexander Egorenkov <egorenar@...ux.ibm.com>
Cc: agordeev@...ux.ibm.com, akpm@...ux-foundation.org,
borntraeger@...ux.ibm.com, cohuck@...hat.com, corbet@....net,
eperezma@...hat.com, frankja@...ux.ibm.com, gor@...ux.ibm.com,
hca@...ux.ibm.com, imbrenda@...ux.ibm.com, jasowang@...hat.com,
kvm@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-s390@...r.kernel.org, mcasquer@...hat.com, mst@...hat.com,
svens@...ux.ibm.com, thuth@...hat.com, virtualization@...ts.linux.dev,
xuanzhuo@...ux.alibaba.com, zaslonko@...ux.ibm.com
Subject: Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()
On 16.10.24 17:47, David Hildenbrand wrote:
>>>
>>> When I wrote that code I was rather convinced that the variant in this patch
>>> is the right thing to do.
>>
>> A short explanation about what a stand-alone kdump is.
>>
>> * First, it's not really a _regular_ kdump activated with kexec-tools and
>> executed by Linux itself but a regular stand-alone dump (SCSI) from the
>> FW's perspective (one has to use HMC or dumpconf to execute it and not
>> with kexec-tools like for the _regular_ kdump).
>
> Ah, that makes sense.
>
>> * One has to reserve crashkernel memory region in the old crashed kernel
>> even if it remains unused until the dump starts.
>> * zipl uses regular kdump kernel and initramfs to create stand-alone
>> dumper images and to write them to a dump disk which is used for
>> IPLIng the stand-alone dumper.
>> * The zipl bootloader takes care of transferring the old kernel memory
>> saved in HSA by the FW to the crashkernel memory region reserved by the old
>> crashed kernel before it enters the dumper. The HSA memory is released
>> by the zipl bootloader _before_ the dumper image is entered,
>> therefore, we cannot use HSA to read old kernel memory, and instead
>> use memory from crashkernel region, just like the regular kdump.
>> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>> the dumper like a regular stand-alone dump (e.g. zfcpdump).
>> * Summarized, zipl bootloader prepares an environment which is expected by
>> the regular kdump for a stand-alone kdump dumper before it is entered.
>
> Thanks for the details!
>
>>
>> In my opinion, the correct version of is_kdump_kernel() would be
>>
>> bool is_kdump_kernel(void)
>> {
>> return oldmem_data.start;
>> }
>>
>> because Linux kernel doesn't differentiate between both the regular
>> and the stand-alone kdump where it matters while performing dumper
>> operations (e.g. reading saved old kernel memory from crashkernel memory region).
>>
>
> Right, but if we consider "/proc/vmcore is available", a better version
> would IMHO be:
>
> bool is_kdump_kernel(void)
> {
> return dump_available();
> }
>
> Because that is mostly (not completely) how is_kdump_kernel() would have
> worked right now *after* we had the elfcorehdr_alloc() during the
> fs_init call.
>
>
>> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
>> is to tell us whether Linux kernel runs in a kdump like environment and not
>> whether the current mode is identical to the proper and true kdump,
>> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
>> is one, regardless how it was started, by kexecing or IPLing
>> from a disk.
>
> Same thinking here.
>
>>
>> The stand-alone kdump has a very special use case which most users will
>> never encounter. And usually, one just takes zfcpdump instead which is
>> more robust and much smaller considering how big kdump initrd can get.
>> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.
>
> Makes sense, so it boils down to either
>
> bool is_kdump_kernel(void)
> {
> return oldmem_data.start;
> }
>
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
> available or
>
> bool is_kdump_kernel(void)
> {
> return dump_available();
> }
>
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
> available. There is the chance of is_kdump_kernel() being "true" if
> "elfcorehdr_alloc()" fails with -ENODEV.
>
>
> You're call :) Thanks!
>
What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:
From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@...hat.com>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()
s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.
Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().
This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.
Update the documentation above dump_available().
Tested-by: Mario Casquero <mcasquer@...hat.com>
Signed-off-by: David Hildenbrand <david@...hat.com>
---
arch/s390/include/asm/kexec.h | 4 ++++
arch/s390/kernel/crash_dump.c | 6 ++++++
arch/s390/kernel/smp.c | 16 ++++++++--------
3 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
void arch_kexec_unprotect_crashkres(void);
#define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
#endif
#ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
int arch_kimage_file_post_load_cleanup(struct kimage *image);
#define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
#endif
+
#endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
prot);
}
+bool is_kdump_kernel(void)
+{
+ return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
static const char *nt_name(Elf64_Word type)
{
const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
/*
* Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
* 1) standard zfcp/nvme dump
* condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
* The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
* with sigp stop-and-store-status. The firmware or the boot-loader
* stored the registers of the boot CPU in the absolute lowcore in the
* memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- * or stand-alone kdump for DASD
- * condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ * condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
* The state for all CPUs except the boot CPU needs to be collected
* with sigp stop-and-store-status. The kexec code or the boot-loader
* stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- * condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- * This case does not exist for s390 anymore, setup_arch explicitly
- * deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
*/
static bool dump_available(void)
{
--
2.46.1
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists