lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87956f31-472d-4091-8061-1e55fea7a3d7@redhat.com>
Date: Wed, 16 Oct 2024 17:54:56 +0200
From: David Hildenbrand <david@...hat.com>
To: Alexander Egorenkov <egorenar@...ux.ibm.com>
Cc: agordeev@...ux.ibm.com, akpm@...ux-foundation.org,
 borntraeger@...ux.ibm.com, cohuck@...hat.com, corbet@....net,
 eperezma@...hat.com, frankja@...ux.ibm.com, gor@...ux.ibm.com,
 hca@...ux.ibm.com, imbrenda@...ux.ibm.com, jasowang@...hat.com,
 kvm@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-s390@...r.kernel.org, mcasquer@...hat.com, mst@...hat.com,
 svens@...ux.ibm.com, thuth@...hat.com, virtualization@...ts.linux.dev,
 xuanzhuo@...ux.alibaba.com, zaslonko@...ux.ibm.com
Subject: Re: [PATCH v2 1/7] s390/kdump: implement is_kdump_kernel()

On 16.10.24 17:47, David Hildenbrand wrote:
>>>
>>> When I wrote that code I was rather convinced that the variant in this patch
>>> is the right thing to do.
>>
>> A short explanation about what a stand-alone kdump is.
>>
>> * First, it's not really a _regular_ kdump activated with kexec-tools and
>>     executed by Linux itself but a regular stand-alone dump (SCSI) from the
>>     FW's perspective (one has to use HMC or dumpconf to execute it and not
>>     with kexec-tools like for the _regular_ kdump).
> 
> Ah, that makes sense.
> 
>> * One has to reserve crashkernel memory region in the old crashed kernel
>>     even if it remains unused until the dump starts.
>> * zipl uses regular kdump kernel and initramfs to create stand-alone
>>     dumper images and to write them to a dump disk which is used for
>>     IPLIng the stand-alone dumper.
>> * The zipl bootloader takes care of transferring the old kernel memory
>>     saved in HSA by the FW to the crashkernel memory region reserved by the old
>>     crashed kernel before it enters the dumper. The HSA memory is released
>>     by the zipl bootloader _before_ the dumper image is entered,
>>     therefore, we cannot use HSA to read old kernel memory, and instead
>>     use memory from crashkernel region, just like the regular kdump.
>> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>>     the dumper like a regular stand-alone dump (e.g. zfcpdump).
>> * Summarized, zipl bootloader prepares an environment which is expected by
>>     the regular kdump for a stand-alone kdump dumper before it is entered.
> 
> Thanks for the details!
> 
>>
>> In my opinion, the correct version of is_kdump_kernel() would be
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start;
>> }
>>
>> because Linux kernel doesn't differentiate between both the regular
>> and the stand-alone kdump where it matters while performing dumper
>> operations (e.g. reading saved old kernel memory from crashkernel memory region).
>>
> 
> Right, but if we consider "/proc/vmcore is available", a better version
> would IMHO be:
> 
> bool is_kdump_kernel(void)
> {
>             return dump_available();
> }
> 
> Because that is mostly (not completely) how is_kdump_kernel() would have
> worked right now *after* we had the elfcorehdr_alloc() during the
> fs_init call.
> 
> 
>> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
>> is to tell us whether Linux kernel runs in a kdump like environment and not
>> whether the current mode is identical to the proper and true kdump,
>> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
>> is one, regardless how it was started, by kexecing or IPLing
>> from a disk.
> 
> Same thinking here.
> 
>>
>> The stand-alone kdump has a very special use case which most users will
>> never encounter. And usually, one just takes zfcpdump instead which is
>> more robust and much smaller considering how big kdump initrd can get.
>> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.
> 
> Makes sense, so it boils down to either
> 
> bool is_kdump_kernel(void)
> {
>            return oldmem_data.start;
> }
> 
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
> available or
> 
> bool is_kdump_kernel(void)
> {
>            return dump_available();
> }
> 
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
> available. There is the chance of is_kdump_kernel() being "true" if
> "elfcorehdr_alloc()" fails with -ENODEV.
> 
> 
> You're call :) Thanks!
> 

What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:


 From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@...hat.com>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()

s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().

This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.

Update the documentation above dump_available().

Tested-by: Mario Casquero <mcasquer@...hat.com>
Signed-off-by: David Hildenbrand <david@...hat.com>
---
  arch/s390/include/asm/kexec.h |  4 ++++
  arch/s390/kernel/crash_dump.c |  6 ++++++
  arch/s390/kernel/smp.c        | 16 ++++++++--------
  3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
  
  void arch_kexec_unprotect_crashkres(void);
  #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
  #endif
  
  #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
  int arch_kimage_file_post_load_cleanup(struct kimage *image);
  #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
  #endif
+
  #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
  						       prot);
  }
  
+bool is_kdump_kernel(void)
+{
+	return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
  static const char *nt_name(Elf64_Word type)
  {
  	const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
  
  /*
   * Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
   * 1) standard zfcp/nvme dump
   *    condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
   *    The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
   *    with sigp stop-and-store-status. The firmware or the boot-loader
   *    stored the registers of the boot CPU in the absolute lowcore in the
   *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
   *    The state for all CPUs except the boot CPU needs to be collected
   *    with sigp stop-and-store-status. The kexec code or the boot-loader
   *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
   */
  static bool dump_available(void)
  {
-- 
2.46.1


-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ