[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d91f8728-6a63-415d-577c-bd76e69ec7f6@oracle.com>
Date: Fri, 28 Oct 2022 10:29:45 -0500
From: Eric DeVolder <eric.devolder@...cle.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Baoquan He <bhe@...hat.com>, david@...hat.com,
Oscar Salvador <osalvador@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, vgoyal@...hat.com, tglx@...utronix.de,
mingo@...hat.com, dave.hansen@...ux.intel.com, hpa@...or.com,
nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org,
sourabhjain@...ux.ibm.com, linux-mm@...ck.org
Subject: Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support
On 10/28/22 05:19, Borislav Petkov wrote:
> On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote:
>> Be aware, in reality, that if the system was fully populated, it would not
>> actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a
>> large contiguous address space which would require just a single phdr.
>
> Then that from below:
>
> pnum += CONFIG_CRASH_MAX_MEMORY_RANGES;
>
> which then would end up allocating 8192 would be a total waste.
>
> So why don't you make that number dynamic then?
>
> You start with something sensible:
>
> total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others"
>
> I.e., a number which is a good compromise on the majority of machines.
>
> Then, on hotplug events you count how many new regions are coming in
> and when you reach the total_num_pheaders number, you double it (or some
> other increase stragegy), reallocate the ELF header buffers etc needed
> for kdump and you're good.
>
> This way, you don't waste memory unnecessarily on the majority of
> systems and those who need more, get to allocate more.
This patch series sizes and allocates the memory buffer/segment for the elfcorehdr once, at kdump
load time.
In order to dynamically resize the elcorehdr memory buffer/segment, that causes the following ripple
effects:
- Permitting resizing of the elfcorehdr requires a means to "allocate" a new size buffer from
within the crash kernel reserved area. There is no allocator today; currently it is a kind of
one-pass placement process that happens at load time. The critical side effect of allocating a new
elfcorehdr buffer memory/segment is that it creates a new address for the elfcorehdr.
- The elfcorehdr is passed to the crash kernel via the elfcorehdr= kernel cmdline option. As such,
a dynamic change to the size of the elfcorehdr size necessarily invites a change of address of that
buffer, and therefore a change to rewrite the crash kernel cmdline to reflect the new elfcorehdr
buffer address.
- A change to the cmdline, also invites a possible change of address of the buffer containing the
cmdline, and thus a change to the x86 boot_params, which contains the cmdline pointer.
- A change to the cmdline and/or boot_params, which are *not* excluded from the hash/digest, means
that the purgatory hash/digest needs to be recomputed, and purgatory re-linked with the new
hash/digest and replaced.
A fair amount of work, but I have had this working in the past, around the v1 patch series
timeframe. However, it only worked for the kexec_file_load() syscall as all the needed pieces of
information were available; but for kexec_load(), it isn't possible to relink purgatory as by that
point purgatory is but a user-space binary blob.
It was feedback on the v1/v2 that pointed out that by excluding the elfcorehdr from the hash/digest,
the "change of address" problem with the elfcorehdr buffer/segment goes away, and, in turn, negates
the need to: introduce an allocator for the crash kernel reserved space, rewrite the crash kernel
cmdline with a new elfcorehdr, update boot_params with a new cmdline and re-link and replace
purgatory with the updated digest. And it enables this hotplug efforts to support kexec_load()
syscall as well.
So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer.
If that can be agreed upon, then it is "just a matter" of picking a useful elfcorehdr size.
Currently that size is derived from the NR_DEFAULT_CPUS and CRASH_MAX_MEMORY_RANGES. So, there is
still the CRASH_MAX_MEMORY_RANGES knob to help a dial in size, should there be some issue with the
default value/size.
Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and CRASH_MAX_MEMORY_RANGES
and simply go with CRASH_HOTPLUG_ELFCOREHDR_SZ which simply specifies the buffer size, then I'm also
good with that.
I still owe a much better explanation of how to size the elfcorehdr. I can use the comments and
ideas from the discussion to provide the necessary insight when choosing this value, whether that be
CRASH_MAX_MEMORY_RANGES or CRASH_HOTPLUG_ELFCOREHDR_SZ.
>
>> I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr
>> number value to be reflective of CPUs and/or memory; not all systems support
>> both CPU and memory hotplug. For example, I have queued up this change to
>> reflect this:
>>
>> if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
>
> If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only
> that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and
> CONFIG_MEMORY_HOTPLUG already.
>
> If you end up making the number dynamic, then you could make that a
> different Kconfig item which contains all that crash code as most of the
> people won't need it anyway.
It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such:
config CRASH_MAX_MEMORY_RANGES
depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG
CRASH_MAX_MEMORY_RANGES should have never had CPU_HOTPLUG as a dependency; that was a cut-n-paste
error on my part.
>
> Hmm?
>
Thank you for the time and thought on this topic!
eric
Powered by blists - more mailing lists