lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 10 Feb 2023 18:35:09 -0600 From: Eric DeVolder <eric.devolder@...cle.com> To: Sourabh Jain <sourabhjain@...ux.ibm.com>, Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org, x86@...nel.org, kexec@...ts.infradead.org, ebiederm@...ssion.com, dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com Cc: mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com, robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com, konrad.wilk@...cle.com, boris.ostrovsky@...cle.com Subject: Re: [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr notes On 2/10/23 00:29, Sourabh Jain wrote: > > On 10/02/23 01:09, Eric DeVolder wrote: >> >> >> On 2/9/23 12:43, Sourabh Jain wrote: >>> Hello Eric, >>> >>> On 09/02/23 23:01, Eric DeVolder wrote: >>>> >>>> >>>> On 2/8/23 07:44, Thomas Gleixner wrote: >>>>> Eric! >>>>> >>>>> On Tue, Feb 07 2023 at 11:23, Eric DeVolder wrote: >>>>>> On 2/1/23 05:33, Thomas Gleixner wrote: >>>>>> >>>>>> So my latest solution is introduce two new CPUHP states, CPUHP_AP_ELFCOREHDR_ONLINE >>>>>> for onlining and CPUHP_BP_ELFCOREHDR_OFFLINE for offlining. I'm open to better names. >>>>>> >>>>>> The CPUHP_AP_ELFCOREHDR_ONLINE needs to be placed after CPUHP_BRINGUP_CPU. My >>>>>> attempts at locating this state failed when inside the STARTING section, so I located >>>>>> this just inside the ONLINE sectoin. The crash hotplug handler is registered on >>>>>> this state as the callback for the .startup method. >>>>>> >>>>>> The CPUHP_BP_ELFCOREHDR_OFFLINE needs to be placed before CPUHP_TEARDOWN_CPU, and I >>>>>> placed it at the end of the PREPARE section. This crash hotplug handler is also >>>>>> registered on this state as the callback for the .teardown method. >>>>> >>>>> TBH, that's still overengineered. Something like this: >>>>> >>>>> bool cpu_is_alive(unsigned int cpu) >>>>> { >>>>> struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); >>>>> >>>>> return data_race(st->state) <= CPUHP_AP_IDLE_DEAD; >>>>> } >>>>> >>>>> and use this to query the actual state at crash time. That spares all >>>>> those callback heuristics. >>>>> >>>>>> I'm making my way though percpu crash_notes, elfcorehdr, vmcoreinfo, >>>>>> makedumpfile and (the consumer of it all) the userspace crash utility, >>>>>> in order to understand the impact of moving from for_each_present_cpu() >>>>>> to for_each_online_cpu(). >>>>> >>>>> Is the packing actually worth the trouble? What's the actual win? >>>>> >>>>> Thanks, >>>>> >>>>> tglx >>>>> >>>>> >>>> >>>> Thomas, >>>> I've investigated the passing of crash notes through the vmcore. What I've learned is that: >>>> >>>> - linux/fs/proc/vmcore.c (which makedumpfile references to do its job) does >>>> not care what the contents of cpu PT_NOTES are, but it does coalesce them together. >>>> >>>> - makedumpfile will count the number of cpu PT_NOTES in order to determine its >>>> nr_cpus variable, which is reported in a header, but otherwise unused (except >>>> for sadump method). >>>> >>>> - the crash utility, for the purposes of determining the cpus, does not appear to >>>> reference the elfcorehdr PT_NOTEs. Instead it locates the various >>>> cpu_[possible|present|online]_mask and computes nr_cpus from that, and also of >>>> course which are online. In addition, when crash does reference the cpu PT_NOTE, >>>> to get its prstatus, it does so by using a percpu technique directly in the vmcore >>>> image memory, not via the ELF structure. Said differently, it appears to me that >>>> crash utility doesn't rely on the ELF PT_NOTEs for cpus; rather it obtains them >>>> via kernel cpumasks and the memory within the vmcore. >>>> >>>> With this understanding, I did some testing. Perhaps the most telling test was that I >>>> changed the number of cpu PT_NOTEs emitted in the crash_prepare_elf64_headers() to just 1, >>>> hot plugged some cpus, then also took a few offline sparsely via chcpu, then generated a >>>> vmcore. The crash utility had no problem loading the vmcore, it reported the proper number >>>> of cpus and the number offline (despite only one cpu PT_NOTE), and changing to a different >>>> cpu via 'set -c 30' and the backtrace was completely valid. >>>> >>>> My take away is that crash utility does not rely upon ELF cpu PT_NOTEs, it obtains the >>>> cpu information directly from kernel data structures. Perhaps at one time crash relied >>>> upon the ELF information, but no more. (Perhaps there are other crash dump analyzers >>>> that might rely on the ELF info?) >>>> >>>> So, all this to say that I see no need to change crash_prepare_elf64_headers(). There >>>> is no compelling reason to move away from for_each_present_cpu(), or modify the list for >>>> online/offline. >>>> >>>> Which then leaves the topic of the cpuhp state on which to register. Perhaps reverting >>>> back to the use of CPUHP_BP_PREPARE_DYN is the right answer. There does not appear to >>>> be a compelling need to accurately track whether the cpu went online/offline for the >>>> purposes of creating the elfcorehdr, as ultimately the crash utility pulls that from >>>> kernel data structures, not the elfcorehdr. >>>> >>>> I think this is what Sourabh has known and has been advocating for an optimization >>>> path that allows not regenerating the elfcorehdr on cpu changes (because all the percpu >>>> structs are all laid out). I do think it best to leave that as an arch choice. >>> >>> Since things are clear on how the PT_NOTES are consumed in kdump kernel [fs/proc/vmcore.c], >>> makedumpfile, and crash tool I need your opinion on this: >>> >>> Do we really need to regenerate elfcorehdr for CPU hotplug events? >>> If yes, can you please list the elfcorehdr components that changes due to CPU hotplug. >> Due to the use of for_each_present_cpu(), it is possible for the number of cpu PT_NOTEs >> to fluctuate as cpus are un/plugged. Onlining/offlining of cpus does not impact the >> number of cpu PT_NOTEs (as the cpus are still present). >> >>> >>> From what I understood, crash notes are prepared for possible CPUs as system boots and >>> could be used to create a PT_NOTE section for each possible CPU while generating the elfcorehdr >>> during the kdump kernel load. >>> >>> Now once the elfcorehdr is loaded with PT_NOTEs for every possible CPU there is no need to >>> regenerate it for CPU hotplug events. Or do we? >> >> For onlining/offlining of cpus, there is no need to regenerate the elfcorehdr. However, >> for actual hot un/plug of cpus, the answer is yes due to for_each_present_cpu(). The >> caveat here of course is that if crash utility is the only coredump analyzer of concern, >> then it doesn't care about these cpu PT_NOTEs and there would be no need to re-generate them. >> >> Also, I'm not sure if ARM cpu hotplug, which is just now coming into mainstream, impacts >> any of this. >> >> Perhaps the one item that might help here is to distinguish between actual hot un/plug of >> cpus, versus onlining/offlining. At the moment, I can not distinguish between a hot plug >> event and an online event (and unplug/offline). If those were distinguishable, then we >> could only regenerate on un/plug events. >> >> Or perhaps moving to for_each_possible_cpu() is the better choice? > > Yes, because once elfcorehdr is built with possible CPUs we don't have to worry about > hot[un]plug case. > > Here is my view on how things should be handled if a core-dump analyzer is dependent on > elfcorehdr PT_NOTEs to find online/offline CPUs. > > A PT_NOTE in elfcorehdr holds the address of the corresponding crash notes (kernel has > one crash note per CPU for every possible CPU). Though the crash notes are allocated > during the boot time they are populated when the system is on the crash path. > > This is how crash notes are populated on PowerPC and I am expecting it would be something > similar on other architectures too. > > The crashing CPU sends IPI to every other online CPU with a callback function that updates the > crash notes of that specific CPU. Once the IPI completes the crashing CPU updates its own crash > note and proceeds further. > > The crash notes of CPUs remain uninitialized if the CPUs were offline or hot unplugged at the time > system crash. The core-dump analyzer should be able to identify [un]/initialized crash notes > and display the information accordingly. > > Thoughts? > > - Sourabh In general, I agree with your points. You've presented a strong case to go with for_each_possible_cpu() in crash_prepare_elf64_headers() and those crash notes would always be present, and we can ignore changes to cpus wrt/ elfcorehdr updates. But what do we do about kexec_load() syscall? The way the userspace utility works is it determines cpus by: nr_cpus = sysconf(_SC_NPROCESSORS_CONF); which is not the equivalent of possible_cpus. So the complete list of cpu PT_NOTEs is not generated up front. We would need a solution for that? Thanks, eric PS. I'll be on vacation all of next week, returning 20feb.
Powered by blists - more mailing lists