[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fff8c92a-7e82-e917-8925-584664543b90@linux.ibm.com>
Date: Tue, 2 May 2023 15:06:29 +0530
From: Hari Bathini <hbathini@...ux.ibm.com>
To: Eric DeVolder <eric.devolder@...cle.com>,
Baoquan He <bhe@...hat.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, vgoyal@...hat.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
sourabhjain@...ux.ibm.com, konrad.wilk@...cle.com,
boris.ostrovsky@...cle.com
Subject: Re: [PATCH v21 5/7] x86/crash: add x86 crash hotplug support
On 02/05/23 12:03 am, Eric DeVolder wrote:
>
>
> On 4/28/23 13:31, Hari Bathini wrote:
>>
>> On 28/04/23 2:55 pm, Baoquan He wrote:
>>> On 04/27/23 at 10:26pm, Hari Bathini wrote:
>>>> On 27/04/23 2:19 pm, Baoquan He wrote:
>>>>> On 04/27/23 at 12:39pm, Hari Bathini wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On 04/04/23 11:33 pm, Eric DeVolder wrote:
>>>>>>> When CPU or memory is hot un/plugged, or off/onlined, the crash
>>>>>>> elfcorehdr, which describes the CPUs and memory in the system,
>>>>>>> must also be updated.
>>>>>>>
>>>>>>> The segment containing the elfcorehdr is identified at run-time
>>>>>>> in crash_core:crash_handle_hotplug_event(), which works for both
>>>>>>> the kexec_load() and kexec_file_load() syscalls. A new elfcorehdr
>>>>>>> is generated from the available CPUs and memory into a buffer,
>>>>>>> and then installed over the top of the existing elfcorehdr.
>>>>>>>
>>>>>>> In the patch 'kexec: exclude elfcorehdr from the segment digest'
>>>>>>> the need to update purgatory due to the change in elfcorehdr was
>>>>>>> eliminated. As a result, no changes to purgatory or boot_params
>>>>>>> (as the elfcorehdr= kernel command line parameter pointer
>>>>>>> remains unchanged and correct) are needed, just elfcorehdr.
>>>>>>>
>>>>>>> To accommodate a growing number of resources via hotplug, the
>>>>>>> elfcorehdr segment must be sufficiently large enough to accommodate
>>>>>>> changes, see the CRASH_MAX_MEMORY_RANGES description. This is used
>>>>>>> only on the kexec_file_load() syscall; for kexec_load() userspace
>>>>>>> will need to size the segment similarly.
>>>>>>>
>>>>>>> To accommodate kexec_load() syscall in the absence of
>>>>>>
>>>>>> Firstly, thanks! This series is a nice improvement to kdump support
>>>>>> in hotplug environment.
> Thank you!
>
>>>>>>
>>>>>> One concern though is that this change assumes corresponding support
>>>>>> in kexec-tools. Without that support kexec_load would fail to boot
>>>>>> with digest verification failure, iiuc.
>
> Yes, you've correctly identified that if a hotplug change occurs
> following kexec_load
> (made with kexec-tools unaltered for hotplug), then a subsequent panic
> would in fact
> fail the purgatory digest verification, and kdump would not happen.
>
>>>>>
>>>>> Eric has posted patchset to modify kexec_tools to support that, please
>>>>> see the link Eric pasted in the cover letter.
>>>>>
>>>>> http://lists.infradead.org/pipermail/kexec/2022-October/026032.html
>>>>
>>>> Right, Baoquan.
>>>>
>>>> I did see that and if I read the code correctly, without that patchset
>>>> kexec_load would fail. Not with an explicit error that hotplug support
>>>> is missing or such but it would simply fail to boot into capture kernel
>>>> with digest verification failure.
> This is correct.
>
>>>>
>>>> My suggestion was to avoid that userspace tool breakage for older
>>>> kexec-tools version by introducing a new kexec flag that can tell
>>>> kernel that kexec-tools is ready to use this in-kernel update support.
>>>> So, if kexec_load happens without the flag, avoid doing an in-kernel
>>>> update on hotplug. I hope that clears the confusion.
>>>
>>> Yeah, sounds like a good idea. It may be extended in later patch.
>>
>> Fixing it in this series itself would be a cleaner way, I guess.
>
> You're suggestion of using a flag makes alot of sense; it is an indication
> to the kernel that it is valid/okay to modify the kexec_load elfcorehdr.
> Only kexec-tools that understands this (meaning the elfcorehdr buffer is
> appropriately sized *and* excludes the elfcorehdr from the purgatory check)
> would set that flag.
>
> The roll-out of this feature needs to be coordinated, no doubt. There
> are three
> pieces to this puzzle: this kernel series, the udev rule changes, and
> the changes
> to kexec-tools for kexec_load.
>
> I consider the udev rule changes critical to making this feature work
> efficiently.
> I also think that deploying the udev rules immediately is doable since
> nothing
> references them, yet; they would be NOPs. And they would be in place
> when the
> kernel and/or kexec-tool changes deploy.
>
> However, your point about supporting kexec_load with and without this
> new flag
> means the sysfs nodes upon which the udev rule change rely need to be a bit
> smarter now. (I'm assuming these udev rules will be generally accepted
> as-is,
> as they are simple and efficient.)
>
> The sysfs crash_hotplug nodes need to take into account kexec_file_load vs
> (kexec_load && new_flag). Generally speaking these crash_hotplug sysfs
> nodes we
> want to be 1 going forward, but where kexec_load/kexec-tools is older
> and/or no new_flag,
> it needs to be 0. In this way the udev rules can remain as proposed and
> work properly
> for kexec_file_load and both flavors of kexec_load.
Right. That is the tricky part. kdump scripts and kexec-tools have to
be in sync if udev rules have to just rely on crash_hotplug.
Thanks
Hari
Powered by blists - more mailing lists