[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9db3d66-f74d-fcab-837e-bfcabae8bce7@linux.ibm.com>
Date:   Tue, 7 Mar 2023 14:18:57 +0530
From:   Sourabh Jain <sourabhjain@...ux.ibm.com>
To:     Eric DeVolder <eric.devolder@...cle.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        kexec@...ts.infradead.org, ebiederm@...ssion.com,
        dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        dave.hansen@...ux.intel.com, hpa@...or.com,
        nramas@...ux.microsoft.com, thomas.lendacky@....com,
        robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
        konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v19 6/7] crash: change crash_prepare_elf64_headers() to
 for_each_possible_cpu()
Hello Eric,
On 06/03/23 21:52, Eric DeVolder wrote:
> The function crash_prepare_elf64_headers() generates the elfcorehdr
> which describes the cpus and memory in the system for the crash kernel.
> In particular, it writes out ELF PT_NOTEs for memory regions and the
> cpus in the system.
>
> With respect to the cpus, the current implementation utilizes
> for_each_present_cpu() which means that as cpus are added and removed,
> the elfcorehdr must again be updated to reflect the new set of cpus.
>
> The reasoning behind the change to use for_each_possible_cpu(), is:
>
> - At kernel boot time, all percpu crash_notes are allocated for all
>    possible cpus; that is, crash_notes are not allocated dynamically
>    when cpus are plugged/unplugged. Thus the crash_notes for each
>    possible cpu are always available.
>
> - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per cpu.
>    Changing to for_each_possible_cpu() is valid as the crash_notes
>    pointed to by each cpu PT_NOTE are present and always valid.
>
> Furthermore, examining a common crash processing path of:
>
>   kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer
>             elfcorehdr      /proc/vmcore     vmcore
>
> reveals how the ELF cpu PT_NOTEs are utilized:
>
> - Upon panic, each cpu is sent an IPI and shuts itself down, recording
>   its state in its crash_notes. When all cpus are shutdown, the
>   crash kernel is launched with a pointer to the elfcorehdr.
>
> - The crash kernel via linux/fs/proc/vmcore.c does not examine or
>   use the contents of the PT_NOTEs, it exposes them via /proc/vmcore.
>
> - The makedumpfile utility uses /proc/vmcore and reads the cpu
>   PT_NOTEs to craft a nr_cpus variable, which is reported in a
>   header but otherwise generally unused. Makedumpfile creates the
>   vmcore.
>
> - The 'crash' dump analyzer does not appear to reference the cpu
>   PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask
>   symbols and directly examines those structure contents from vmcore
>   memory. From that information it is able to determine which cpus
>   are present and online, and locate the corresponding crash_notes.
>   Said differently, it appears that 'crash' analyzer does not rely
>   on the ELF PT_NOTEs for cpus; rather it obtains the information
>   directly via kernel symbols and the memory within the vmcore.
>
> (There maybe other vmcore generating and analysis tools that do use
> these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most
> common solution.)
>
> This change results in the benefit of having all cpus described in
> the elfcorehdr, and therefore reducing the need to re-generate the
> elfcorehdr on cpu changes, at the small expense of an additional
> 56 bytes per PT_NOTE for not-present-but-possible cpus.
>
> On systems where kexec_file_load() syscall is utilized, all the above
> is valid. On systems where kexec_load() syscall is utilized, there
> may be the need for the elfcorehdr to be regenerated once. The reason
> being that some archs only populate the 'present' cpus in the
> /sys/devices/system/cpus entries, which the userspace 'kexec' utility
> uses to generate the userspace-supplied elfcorehdr. In this situation,
> one memory or cpu change will rewrite the elfcorehdr via the
> crash_prepare_elf64_headers() function and now all possible cpus will
> be described, just as with kexec_file_load() syscall.
>
> Suggested-by: Sourabh Jain <sourabhjain@...ux.ibm.com>
> Signed-off-by: Eric DeVolder <eric.devolder@...cle.com>
> ---
>   kernel/crash_core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index dba4b75f7541..537b199a8774 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -365,7 +365,7 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map,
>   	ehdr->e_phentsize = sizeof(Elf64_Phdr);
>   
>   	/* Prepare one phdr of type PT_NOTE for each present CPU */
We need to change this comment as well.
> -	for_each_present_cpu(cpu) {
> +	for_each_possible_cpu(cpu) {
- Sourabh Jain
Powered by blists - more mailing lists
 
