linux-kernel - Re: [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr notes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd03f47a-0017-6239-04e9-e796dca03c0c@oracle.com>
Date:   Tue, 7 Feb 2023 11:23:58 -0600
From:   Eric DeVolder <eric.devolder@...cle.com>
To:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
        x86@...nel.org, kexec@...ts.infradead.org, ebiederm@...ssion.com,
        dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com
Cc:     mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com,
        robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
        sourabhjain@...ux.ibm.com, konrad.wilk@...cle.com,
        boris.ostrovsky@...cle.com
Subject: Re: [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr
 notes



On 2/1/23 05:33, Thomas Gleixner wrote:
> Eric!
> 
> On Tue, Jan 31 2023 at 17:42, Eric DeVolder wrote:
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -366,6 +366,14 @@ int crash_prepare_elf64_headers(struct kimage *image, struct crash_mem *mem,
>>   
>>   	/* Prepare one phdr of type PT_NOTE for each present CPU */
>>   	for_each_present_cpu(cpu) {
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> +		if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
>> +			/* Skip the soon-to-be offlined cpu */
>> +			if ((image->hp_action == KEXEC_CRASH_HP_REMOVE_CPU) &&
>> +				(cpu == image->offlinecpu))
>> +				continue;
>> +		}
>> +#endif
> 
> I'm failing to see how the above is correct in any way. Look at the
> following sequence of events:
> 
>       1) Offline CPU$N
> 
>          -> Prepare elf headers with CPU$N excluded
> 
>       2) Another hotplug operation != 'Online CPU$N'
> 
>          -> Prepare elf headers with CPU$N included
> 
> Also in case of loading the crash kernel in the situation where not all
> present CPUs are online (think boot time SMT disable) then your
> resulting crash image will contain all present CPUs and none of the
> offline CPUs are excluded.
> 
> How does that make any sense at all?
> 
> This image->hp_action and image->offlinecpu dance is engineering
> voodoo. You just can do:
> 
>          for_each_present_cpu(cpu) {
>              if (!cpu_online(cpu))
>              	continue;
>              do_stuff(cpu);
> 
> which does the right thing in all situations and can be further
> simplified to:
> 
>          for_each_online_cpu(cpu) {
>              do_stuff(cpu);
> 
> without the need for ifdefs or whatever.
> 
> No?
> 
> Thanks,
> 
>          tglx

Thomas,

I've been re-examining the cpuhp framework and understand a bit better its
operation.

Up until now, this patch series has been using either CPUHP_AP_ONLINE_DYN
or more recently CPUHP_BP_PREPARE_DYN with the same handler for both the
startup and teardown callbacks. This resulted in the cpu state, as seen by
my handler, as being incorrect in one direction or the other. For example,
when using CPUHP_AP_ONLINE_DYN, cpu_online() always resulted in 1 for the
cpu in my callback, even during tear down. For CPUHP_BP_PREPARE_DYN,
cpu_online() always resulted in 0. Thus the offlinecpu voodoo.

But no more!

The reason, as I now understand, is simple. A cpu will not show as online
until after state CPUHP_BRINGUP_CPU (when working from CPUHP_OFFLINE towards
CPUHP_ONLINE). And a cpu will not show as offline until after state
CPUHP_TEARDOWN_CPU (when working reverse order from CPUHP_ONLINE to
CPUHP_OFFLINE).

The CPUHP_BRINGUP_CPU is the last state of the PREPARE section, and boots
the new cpu. It is code running on the booting cpu that marks itself as
online.

  CPUHP_BRINGUP_CPU
    .startup()
      bringup_cpu()
        __cpu_up()
         smp_ops.cpu_up()
          native_cpu_up()
           do_boot_cpu()
            ===== on new cpu! =====
            start_secondary()
             set_cpu_online(true)

There are quite a few CPUHP_..._STARTING states before the cpu is in a productive state.

The CPUHP_TEARDOWN_CPU is the last state in the STARTING section, and takes the cpu down.
Work/irqs are removed from this cpu and re-assigned to others.

  CPUHP_TEARDOWN_CPU
    .teardown()
     takedown_cpu()
      take_cpu_down()
       __cpu_disable()
        smp_ops.cpu_disable()
         native_cpu_disable()
          cpu_disable_common()
           remove_cpu_from_maps()
            set_cpu_online(false)

So my latest solution is introduce two new CPUHP states, CPUHP_AP_ELFCOREHDR_ONLINE
for onlining and CPUHP_BP_ELFCOREHDR_OFFLINE for offlining. I'm open to better names.

The CPUHP_AP_ELFCOREHDR_ONLINE needs to be placed after CPUHP_BRINGUP_CPU. My
attempts at locating this state failed when inside the STARTING section, so I located
this just inside the ONLINE sectoin. The crash hotplug handler is registered on
this state as the callback for the .startup method.

The CPUHP_BP_ELFCOREHDR_OFFLINE needs to be placed before CPUHP_TEARDOWN_CPU, and I
placed it at the end of the PREPARE section. This crash hotplug handler is also
registered on this state as the callback for the .teardown method.

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 6c6859bfc454..52d2db4d793e 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -131,6 +131,7 @@ enum cpuhp_state {
     CPUHP_ZCOMP_PREPARE,
     CPUHP_TIMERS_PREPARE,
     CPUHP_MIPS_SOC_PREPARE,
+   CPUHP_BP_ELFCOREHDR_OFFLINE,
     CPUHP_BP_PREPARE_DYN,
     CPUHP_BP_PREPARE_DYN_END        = CPUHP_BP_PREPARE_DYN + 20,
     CPUHP_BRINGUP_CPU,
@@ -205,6 +206,7 @@ enum cpuhp_state {

     /* Online section invoked on the hotplugged CPU from the hotplug thread */
     CPUHP_AP_ONLINE_IDLE,
+   CPUHP_AP_ELFCOREHDR_ONLINE,
     CPUHP_AP_SCHED_WAIT_EMPTY,
     CPUHP_AP_SMPBOOT_THREADS,
     CPUHP_AP_X86_VDSO_VMA_ONLINE,

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 8a439b6d723b..e1a3430f06f4 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c

+   if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
+       result = cpuhp_setup_state_nocalls(CPUHP_AP_ELFCOREHDR_ONLINE,
+                          "crash/cpuhp_online", crash_cpuhp_online, NULL);
+       result = cpuhp_setup_state_nocalls(CPUHP_BP_ELFCOREHDR_OFFLINE,
+                          "crash/cpuhp_offline", NULL, crash_cpuhp_offline);
+   }

With the above, there is no need for offlinecpu, as the crash hotplug handler
callback now observes the correct cpu_online() state in both online and offline
activities.

Which leads me to the next item. Thomas you suggested

           for_each_online_cpu(cpu) {
               do_stuff(cpu);

I've been looking into this further, and don't yet have conclusion.
In light of Sourabh's comments/concerns about packing PT_NOTES, I
need to determine if my introduction of

        if (IS_ENABLED(CONFIG_CRASH_HOTPLUG)) {
            if (!cpu_online(cpu)) continue;
        }

does not cause other downstream issues. My testing was focused on
hot plug/unplugging cpus in a last-on-first-off manner, where as
I now realize cpus can be onlined/offlined sparsely (thus the PT_NOTE
packing concern).

I'm making my way though percpu crash_notes, elfcorehdr, vmcoreinfo,
makedumpfile and (the consumer of it all) the userspace crash utility,
in order to understand the impact of moving from for_each_present_cpu()
to for_each_online_cpu().

At any rate, I wanted to at least put forth the introduction of the
two new CPUHP states and solicit feedback there while I investigate
the for_each_online_cpu() matter.

Thanks for pushing me on this topic!
eric