linux-kernel - Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzRxPAoN+XmOfJzV@zn.tnic>
Date:   Wed, 28 Sep 2022 18:07:24 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Eric DeVolder <eric.devolder@...cle.com>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        kexec@...ts.infradead.org, ebiederm@...ssion.com,
        dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com,
        tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
        hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com,
        robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
        sourabhjain@...ux.ibm.com, konrad.wilk@...cle.com,
        boris.ostrovsky@...cle.com
Subject: Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support

On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
> This topic was discussed previously https://lkml.org/lkml/2022/3/3/372.

Please do not use lkml.org to refer to lkml messages. We have a
perfectly fine archival system at lore.kernel.org. You simply do

https://lore.kernel.org/r/<Message-ID>

when you want to point to a previous mail.

> David points out that terminology is tricky here due to differing behaviors.
> And perhaps that is your point in asking for guidance text. It can be
> complicated

Which means you need an explanation how to use this even more.

And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
something you discover from the hardware?

Your help text talks about System RAM entries in /proc/iomem which means
that those entries are present somewhere in the kernel and you can read
them out and do the proper calculations dynamically instead of doing the
static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.

> , but it all comes down to System RAM entries.
> 
> I could perhaps offer an overly simplified example such that for 1GiB block
> size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would allow for 32TiB
> of memory?

Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
refer to it in that help text so that people can find it and read how to
use your new option.

> The kbuf.bufsz value is obtained via a call to prepare_elf_headers(); I can
> not initialize it at its declaration.

Sorry, I meant this:

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8fc7d678ac72..ee6fd9f1b2b9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
 	if (ret)
 		return ret;
 
-	image->elf_headers = kbuf.buffer;
-	image->elf_headers_sz = kbuf.bufsz;
+	image->elf_headers	= kbuf.buffer;
+	image->elf_headers_sz	= kbuf.bufsz;
+	kbuf.memsz		= kbuf.bufsz;
 
 #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
 	/* Ensure elfcorehdr segment large enough for hotplug changes */
@@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
 	image->elf_headers_sz = kbuf.memsz;
 	image->elfcorehdr_index = image->nr_segments;
 	image->elfcorehdr_index_valid = true;
-#else
-	kbuf.memsz = kbuf.bufsz;
 #endif
+
 	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);

> I'm at a loss as to what to do differently here. You've raised this issue
> before and I went back and looked at the suggestions then and I don't see
> how that applies to this situation. How is this situation different than the
> #ifdef CONFIG_KEXEC_FILE that immediately preceeds it?

See the diff at the end. I'm not saying this is how you should do it
but it should give you a better idea. The logic being, the functions
in the .c file don't really need ifdeffery around them - you're adding
1-2 functions and crash.c is not that big - so they can be built in
unconditionally. You'd need the ifdeffery *in the header only* when
crash.c is not being built.

But I've done it with ifdeffery in the .c file now because yes, the
kexec code is a minefield of ifdeffery. Hell, there's ifdeffery even in
the headers for structs. Ifdeffery you don't really need. Someone should
clean that up and simplify this immensely.

> Currently there is a concurrent effort for PPC support by Sourabh
> Jain, and in that effort arch_map_crash_pages() is using __va(paddr).

Why?

> I do not know the nuances between kmap_local_page() and __va() to
> answer the question.

kmap_local_page() is a generic interface and it should work on any arch.

And it is documented even:

$ git grep kmap_local_page Documentation/

> If kmap_local_page() works for all archs, then I'm happy to drop these
> arch_ variants and use it directly.

Yes, pls do.

---

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 432073385b2d..b73c9628cd85 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -205,6 +205,17 @@ void *arch_kexec_kernel_image_load(struct kimage *image);
 
 int arch_kimage_file_post_load_cleanup(struct kimage *image);
 #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
+
+#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
+void *arch_map_crash_pages(unsigned long paddr, unsigned long size);
+void arch_unmap_crash_pages(void **ptr);
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action);
+#else
+void *arch_map_crash_pages(unsigned long paddr, unsigned long size) { return NULL; }
+void arch_unmap_crash_pages(void **ptr) { }
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
+#endif
+
 #endif
 #endif
 
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 8fc7d678ac72..a526c893abe8 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
 	if (ret)
 		return ret;
 
-	image->elf_headers = kbuf.buffer;
-	image->elf_headers_sz = kbuf.bufsz;
+	image->elf_headers	= kbuf.buffer;
+	image->elf_headers_sz	= kbuf.bufsz;
+	kbuf.memsz		= kbuf.bufsz;
 
 #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
 	/* Ensure elfcorehdr segment large enough for hotplug changes */
@@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
 	image->elf_headers_sz = kbuf.memsz;
 	image->elfcorehdr_index = image->nr_segments;
 	image->elfcorehdr_index_valid = true;
-#else
-	kbuf.memsz = kbuf.bufsz;
 #endif
+
 	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);
@@ -425,7 +425,8 @@ int crash_load_segments(struct kimage *image)
 }
 #endif /* CONFIG_KEXEC_FILE */
 
-#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES
+
 /*
  * NOTE: The addresses and sizes passed to this routine have
  * already been fully aligned on page boundaries. There is no
@@ -462,8 +463,7 @@ void arch_unmap_crash_pages(void **ptr)
  * is prepared in a kernel buffer, and then it is written on top
  * of the existing/old elfcorehdr.
  */
-void arch_crash_handle_hotplug_event(struct kimage *image,
-	unsigned int hp_action)
+void arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action)
 {
 	struct kexec_segment *ksegment;
 	unsigned char *ptr = NULL;
@@ -513,4 +513,5 @@ void arch_crash_handle_hotplug_event(struct kimage *image,
 	if (elfbuf)
 		vfree(elfbuf);
 }
-#endif
+
+#endif /* CONFIG_CRASH_MAX_MEMORY_RANGES */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index a48577a36fb8..0f79ad4c4f80 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -27,6 +27,19 @@ extern struct resource crashk_res;
 extern struct resource crashk_low_res;
 extern note_buf_t __percpu *crash_notes;
 
+/* Alignment required for elf header segment */
+#define ELF_CORE_HEADER_ALIGN   4096
+
+struct crash_mem_range {
+	u64 start, end;
+};
+
+struct crash_mem {
+	unsigned int max_nr_ranges;
+	unsigned int nr_ranges;
+	struct crash_mem_range ranges[];
+};
+
 #ifdef CONFIG_KEXEC_CORE
 #include <linux/list.h>
 #include <linux/compat.h>
@@ -237,19 +250,6 @@ static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 }
 #endif
 
-/* Alignment required for elf header segment */
-#define ELF_CORE_HEADER_ALIGN   4096
-
-struct crash_mem_range {
-	u64 start, end;
-};
-
-struct crash_mem {
-	unsigned int max_nr_ranges;
-	unsigned int nr_ranges;
-	struct crash_mem_range ranges[];
-};
-
 extern int crash_exclude_mem_range(struct crash_mem *mem,
 				   unsigned long long mstart,
 				   unsigned long long mend);
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 5bc5159d9cb1..f6b5d835f826 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void)
 subsys_initcall(crash_save_vmcoreinfo_init);
 
 #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+
+void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size)
+{
+	return NULL;
+}
+
+void __weak arch_unmap_crash_pages(void **ptr) { }
+void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { }
+
 /*
  * To accurately reflect hot un/plug changes, the elfcorehdr (which
  * is passed to the crash kernel via the elfcorehdr= parameter)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette