[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d465173e-a31a-c4d6-af51-59d9ff0c2edc@linux.ibm.com>
Date: Fri, 10 Feb 2023 00:13:00 +0530
From: Sourabh Jain <sourabhjain@...ux.ibm.com>
To: Eric DeVolder <eric.devolder@...cle.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc: mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr
notes
Hello Eric,
On 09/02/23 23:01, Eric DeVolder wrote:
>
>
> On 2/8/23 07:44, Thomas Gleixner wrote:
>> Eric!
>>
>> On Tue, Feb 07 2023 at 11:23, Eric DeVolder wrote:
>>> On 2/1/23 05:33, Thomas Gleixner wrote:
>>>
>>> So my latest solution is introduce two new CPUHP states,
>>> CPUHP_AP_ELFCOREHDR_ONLINE
>>> for onlining and CPUHP_BP_ELFCOREHDR_OFFLINE for offlining. I'm open
>>> to better names.
>>>
>>> The CPUHP_AP_ELFCOREHDR_ONLINE needs to be placed after
>>> CPUHP_BRINGUP_CPU. My
>>> attempts at locating this state failed when inside the STARTING
>>> section, so I located
>>> this just inside the ONLINE sectoin. The crash hotplug handler is
>>> registered on
>>> this state as the callback for the .startup method.
>>>
>>> The CPUHP_BP_ELFCOREHDR_OFFLINE needs to be placed before
>>> CPUHP_TEARDOWN_CPU, and I
>>> placed it at the end of the PREPARE section. This crash hotplug
>>> handler is also
>>> registered on this state as the callback for the .teardown method.
>>
>> TBH, that's still overengineered. Something like this:
>>
>> bool cpu_is_alive(unsigned int cpu)
>> {
>> struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>>
>> return data_race(st->state) <= CPUHP_AP_IDLE_DEAD;
>> }
>>
>> and use this to query the actual state at crash time. That spares all
>> those callback heuristics.
>>
>>> I'm making my way though percpu crash_notes, elfcorehdr, vmcoreinfo,
>>> makedumpfile and (the consumer of it all) the userspace crash utility,
>>> in order to understand the impact of moving from for_each_present_cpu()
>>> to for_each_online_cpu().
>>
>> Is the packing actually worth the trouble? What's the actual win?
>>
>> Thanks,
>>
>> tglx
>>
>>
>
> Thomas,
> I've investigated the passing of crash notes through the vmcore. What
> I've learned is that:
>
> - linux/fs/proc/vmcore.c (which makedumpfile references to do its job)
> does
> not care what the contents of cpu PT_NOTES are, but it does coalesce
> them together.
>
> - makedumpfile will count the number of cpu PT_NOTES in order to
> determine its
> nr_cpus variable, which is reported in a header, but otherwise
> unused (except
> for sadump method).
>
> - the crash utility, for the purposes of determining the cpus, does
> not appear to
> reference the elfcorehdr PT_NOTEs. Instead it locates the various
> cpu_[possible|present|online]_mask and computes nr_cpus from that,
> and also of
> course which are online. In addition, when crash does reference the
> cpu PT_NOTE,
> to get its prstatus, it does so by using a percpu technique directly
> in the vmcore
> image memory, not via the ELF structure. Said differently, it
> appears to me that
> crash utility doesn't rely on the ELF PT_NOTEs for cpus; rather it
> obtains them
> via kernel cpumasks and the memory within the vmcore.
>
> With this understanding, I did some testing. Perhaps the most telling
> test was that I
> changed the number of cpu PT_NOTEs emitted in the
> crash_prepare_elf64_headers() to just 1,
> hot plugged some cpus, then also took a few offline sparsely via
> chcpu, then generated a
> vmcore. The crash utility had no problem loading the vmcore, it
> reported the proper number
> of cpus and the number offline (despite only one cpu PT_NOTE), and
> changing to a different
> cpu via 'set -c 30' and the backtrace was completely valid.
>
> My take away is that crash utility does not rely upon ELF cpu
> PT_NOTEs, it obtains the
> cpu information directly from kernel data structures. Perhaps at one
> time crash relied
> upon the ELF information, but no more. (Perhaps there are other crash
> dump analyzers
> that might rely on the ELF info?)
>
> So, all this to say that I see no need to change
> crash_prepare_elf64_headers(). There
> is no compelling reason to move away from for_each_present_cpu(), or
> modify the list for
> online/offline.
>
> Which then leaves the topic of the cpuhp state on which to register.
> Perhaps reverting
> back to the use of CPUHP_BP_PREPARE_DYN is the right answer. There
> does not appear to
> be a compelling need to accurately track whether the cpu went
> online/offline for the
> purposes of creating the elfcorehdr, as ultimately the crash utility
> pulls that from
> kernel data structures, not the elfcorehdr.
>
> I think this is what Sourabh has known and has been advocating for an
> optimization
> path that allows not regenerating the elfcorehdr on cpu changes
> (because all the percpu
> structs are all laid out). I do think it best to leave that as an arch
> choice.
Since things are clear on how the PT_NOTES are consumed in kdump kernel
[fs/proc/vmcore.c],
makedumpfile, and crash tool I need your opinion on this:
Do we really need to regenerate elfcorehdr for CPU hotplug events?
If yes, can you please list the elfcorehdr components that changes due
to CPU hotplug.
From what I understood, crash notes are prepared for possible CPUs as
system boots and
could be used to create a PT_NOTE section for each possible CPU while
generating the elfcorehdr
during the kdump kernel load.
Now once the elfcorehdr is loaded with PT_NOTEs for every possible CPU
there is no need to
regenerate it for CPU hotplug events. Or do we?
Thanks,
Sourabh Jain
Powered by blists - more mailing lists