[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba047d83-d2e7-761f-7777-b7194411a8dc@linux.ibm.com>
Date: Wed, 12 Oct 2022 10:25:38 +0530
From: Sourabh Jain <sourabhjain@...ux.ibm.com>
To: Eric DeVolder <eric.devolder@...cle.com>,
Borislav Petkov <bp@...en8.de>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org, ebiederm@...ssion.com,
dyoung@...hat.com, bhe@...hat.com, vgoyal@...hat.com,
tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
hpa@...or.com, nramas@...ux.microsoft.com, thomas.lendacky@....com,
robh@...nel.org, efault@....de, rppt@...nel.org, david@...hat.com,
konrad.wilk@...cle.com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support
On 08/10/22 01:30, Eric DeVolder wrote:
>
>
> On 10/4/22 04:10, Sourabh Jain wrote:
>>
>> On 30/09/22 21:06, Eric DeVolder wrote:
>>>
>>>
>>> On 9/28/22 11:07, Borislav Petkov wrote:
>>>> On Tue, Sep 13, 2022 at 02:12:31PM -0500, Eric DeVolder wrote:
>>>>> This topic was discussed previously
>>>>> https://lkml.org/lkml/2022/3/3/372.
>>>>
>>>> Please do not use lkml.org to refer to lkml messages. We have a
>>>> perfectly fine archival system at lore.kernel.org. You simply do
>>>>
>>>> https://lore.kernel.org/r/<Message-ID>
>>>>
>>>> when you want to point to a previous mail.
>>>
>>> ok, thanks for pointing that out to me.
>>>>
>>>>> David points out that terminology is tricky here due to differing
>>>>> behaviors.
>>>>> And perhaps that is your point in asking for guidance text. It can be
>>>>> complicated
>>>>
>>>> Which means you need an explanation how to use this even more.
>>>>
>>>> And why is CONFIG_CRASH_MAX_MEMORY_RANGES even a Kconfig item and not
>>>> something you discover from the hardware?
>>>
>>> No, is the short answer.
>>>
>>>>
>>>> Your help text talks about System RAM entries in /proc/iomem which
>>>> means
>>>> that those entries are present somewhere in the kernel and you can
>>>> read
>>>> them out and do the proper calculations dynamically instead of
>>>> doing the
>>>> static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing.
>>>
>>> The intent is to compute the max size buffer needed to contain a
>>> maximum populated elfcorehdr, which is primarily based on the number
>>> of CPUs and memory regions. Thus far I (and others involved) have
>>> not found a kernel method to determine the maximum number of memory
>>> regions possible (if you are aware of one, please let me know!).
>>> Thus CONFIG_CRASH_MAX_MEMORY_RANGES was born (rather borrowed from
>>> kexec-tools).
>>>
>>> So no dynamic computation is possible, yet.
>>>
>>>>
>>>>> , but it all comes down to System RAM entries.
>>>>>
>>>>> I could perhaps offer an overly simplified example such that for
>>>>> 1GiB block
>>>>> size, for example, the CRASH_MAX_MEMORY_RANGES of 32768 would
>>>>> allow for 32TiB
>>>>> of memory?
>>>>
>>>> Yes, and stick it somewhere in Documentation/admin-guide/kdump/ and
>>>> refer to it in that help text so that people can find it and read
>>>> how to
>>>> use your new option.
>>>>
>>> ok
>>>
>>>>> The kbuf.bufsz value is obtained via a call to
>>>>> prepare_elf_headers(); I can
>>>>> not initialize it at its declaration.
>>>>
>>>> Sorry, I meant this:
>>>>
>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>>> index 8fc7d678ac72..ee6fd9f1b2b9 100644
>>>> --- a/arch/x86/kernel/crash.c
>>>> +++ b/arch/x86/kernel/crash.c
>>>> @@ -395,8 +395,9 @@ int crash_load_segments(struct kimage *image)
>>>> if (ret)
>>>> return ret;
>>>> - image->elf_headers = kbuf.buffer;
>>>> - image->elf_headers_sz = kbuf.bufsz;
>>>> + image->elf_headers = kbuf.buffer;
>>>> + image->elf_headers_sz = kbuf.bufsz;
>>>> + kbuf.memsz = kbuf.bufsz;
>>>> #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
>>>> /* Ensure elfcorehdr segment large enough for hotplug changes */
>>>> @@ -407,9 +408,8 @@ int crash_load_segments(struct kimage *image)
>>>> image->elf_headers_sz = kbuf.memsz;
>>>> image->elfcorehdr_index = image->nr_segments;
>>>> image->elfcorehdr_index_valid = true;
>>>> -#else
>>>> - kbuf.memsz = kbuf.bufsz;
>>>> #endif
>>>> +
>>>> kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
>>>> kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>>>> ret = kexec_add_buffer(&kbuf);
>>>>
>>> ok
>>>
>>>>> I'm at a loss as to what to do differently here. You've raised
>>>>> this issue
>>>>> before and I went back and looked at the suggestions then and I
>>>>> don't see
>>>>> how that applies to this situation. How is this situation
>>>>> different than the
>>>>> #ifdef CONFIG_KEXEC_FILE that immediately preceeds it?
>>>>
>>>> See the diff at the end. I'm not saying this is how you should do it
>>>> but it should give you a better idea. The logic being, the functions
>>>> in the .c file don't really need ifdeffery around them - you're adding
>>>> 1-2 functions and crash.c is not that big - so they can be built in
>>>> unconditionally. You'd need the ifdeffery *in the header only* when
>>>> crash.c is not being built.
>>> ok; I've overlooked that scenario.
>>>>
>>>> But I've done it with ifdeffery in the .c file now because yes, the
>>>> kexec code is a minefield of ifdeffery. Hell, there's ifdeffery
>>>> even in
>>>> the headers for structs. Ifdeffery you don't really need. Someone
>>>> should
>>>> clean that up and simplify this immensely.
>>>
>>> ok
>>>
>>>>
>>>>> Currently there is a concurrent effort for PPC support by Sourabh
>>>>> Jain, and in that effort arch_map_crash_pages() is using __va(paddr).
>>>>
>>>> Why?
>>>>
>>>>> I do not know the nuances between kmap_local_page() and __va() to
>>>>> answer the question.
>>>>
>>>> kmap_local_page() is a generic interface and it should work on any
>>>> arch.
>>>>
>>>> And it is documented even:
>>>>
>>>> $ git grep kmap_local_page Documentation/
>>>>
>>>>> If kmap_local_page() works for all archs, then I'm happy to drop
>>>>> these
>>>>> arch_ variants and use it directly.
>>>>
>>>> Yes, pls do.
>>>
>>> I'll check with Sourabh to see if PPC can work with kmap_local_page().
>> I think kmap_local_page do support on PowerPC. But can you explain
>> why we need this
>> function here, aren't the reserve memory already available to use?
>
> On x86, attempts to access the elfcorehdr without mapping it did not
> work (resulted
> in a fault).
>
> Let me know if using kmap_local_page() in place of __va() in
> arch_map_crash_pages().
> If it does, then I can eliminate arch_un/map_crash_pages() and use
> kmap_local_page()
> directly.
Hello Eric,
Atleast on ppc64 we have direct mapping available and hence just by
doing page shift
on physical address (__va) we can get valid virtual address on powerpc.
In short we don't
have to generate mapping again to access reserved region.
Regardless let's go with kdump_local_page API, it is supported on powerpc.
Thanks,
Sourabh Jain
Powered by blists - more mailing lists