[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64a93c4a-5619-4208-9e9f-83848206d42b@linaro.org>
Date: Mon, 25 Aug 2025 15:55:07 +0300
From: Eugen Hristev <eugen.hristev@...aro.org>
To: David Hildenbrand <david@...hat.com>, Michal Hocko <mhocko@...e.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
linux-arch@...r.kernel.org, linux-mm@...ck.org, tglx@...utronix.de,
andersson@...nel.org, pmladek@...e.com,
linux-arm-kernel@...ts.infradead.org, linux-hardening@...r.kernel.org,
corbet@....net, mojha@....qualcomm.com, rostedt@...dmis.org,
jonechou@...gle.com, tudor.ambarus@...aro.org,
Christoph Hellwig <hch@...radead.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [RFC][PATCH v2 22/29] mm/numa: Register information into Kmemdump
On 8/4/25 16:26, David Hildenbrand wrote:
> On 04.08.25 15:03, Eugen Hristev wrote:
>>
>>
>> On 8/4/25 15:49, David Hildenbrand wrote:
>>> On 04.08.25 14:29, Eugen Hristev wrote:
>>>>
>>>>
>>>> On 8/4/25 15:18, David Hildenbrand wrote:
>>>>> On 04.08.25 13:06, Eugen Hristev wrote:
>>>>>>
>>>>>>
>>>>>> On 8/4/25 13:54, Michal Hocko wrote:
>>>>>>> On Wed 30-07-25 16:04:28, David Hildenbrand wrote:
>>>>>>>> On 30.07.25 15:57, Eugen Hristev wrote:
>>>>>>> [...]
>>>>>>>>> Yes, registering after is also an option. Initially this is how I
>>>>>>>>> designed the kmemdump API, I also had in mind to add a flag, but, after
>>>>>>>>> discussing with Thomas Gleixner, he came up with the macro wrapper idea
>>>>>>>>> here:
>>>>>>>>> https://lore.kernel.org/lkml/87ikkzpcup.ffs@tglx/
>>>>>>>>> Do you think we can continue that discussion , or maybe start it here ?
>>>>>>>>
>>>>>>>> Yeah, I don't like that, but I can see how we ended up here.
>>>>>>>>
>>>>>>>> I also don't quite like the idea that we must encode here what to include in
>>>>>>>> a dump and what not ...
>>>>>>>>
>>>>>>>> For the vmcore we construct it at runtime in crash_save_vmcoreinfo_init(),
>>>>>>>> where we e.g., have
>>>>>>>>
>>>>>>>> VMCOREINFO_STRUCT_SIZE(pglist_data);
>>>>>>>>
>>>>>>>> Could we similar have some place where we construct what to dump similarly,
>>>>>>>> just not using the current values, but the memory ranges?
>>>>>>>
>>>>>>> All those symbols are part of kallsyms, right? Can we just use kallsyms
>>>>>>> infrastructure and a list of symbols to get what we need from there?
>>>>>>>
>>>>>>> In other words the list of symbols to be completely external to the code
>>>>>>> that is defining them?
>>>>>>
>>>>>> Some static symbols are indeed part of kallsyms. But some symbols are
>>>>>> not exported, for example patch 20/29, where printk related symbols are
>>>>>> not to be exported. Another example is with static variables, like in
>>>>>> patch 17/29 , not exported as symbols, but required for the dump.
>>>>>> Dynamic memory regions are not have to also be considered, have a look
>>>>>> for example at patch 23/29 , where dynamically allocated memory needs to
>>>>>> be registered.
>>>>>>
>>>>>> Do you think that I should move all kallsyms related symbols annotation
>>>>>> into a separate place and keep it for the static/dynamic regions in place ?
>>>>>
>>>>> If you want to use a symbol from kmemdump, then make that symbol
>>>>> available to kmemdump.
>>>>
>>>> That's what I am doing, registering symbols with kmemdump.
>>>> Maybe I do not understand what you mean, do you have any suggestion for
>>>> the static variables case (symbols not exported) ?
>>>
>>> Let's use patch #20 as example:
>>>
>>> What I am thinking is that you would not include "linux/kmemdump.h" and
>>> not leak all of that KMEMDUMP_ stuff in all these files/subsystems that
>>> couldn't less about kmemdump.
>>>
>>> Instead of doing
>>>
>>> static struct printk_ringbuffer printk_rb_dynamic;
>>>
>>> You'd do
>>>
>>> struct printk_ringbuffer printk_rb_dynamic;
>>>
>>> and have it in some header file, from where kmemdump could lookup the
>>> address.
>>>
>>> So you move the logic of what goes into a dump from the subsystems to
>>> the kmemdump core.
>>>
>>
>> That works if the people maintaining these systems agree with it.
>> Attempts to export symbols from printk e.g. have been nacked :
>>
>> https://lore.kernel.org/all/20250218-175733-neomutt-senozhatsky@chromium.org/
>
> Do you really need the EXPORT_SYMBOL?
>
> Can't you just not export symbols, building the relevant kmemdump part
> into the core not as a module.
>
> IIRC, kernel/vmcore_info.c is never built as a module, as it also
> accesses non-exported symbols.
Hello David,
I am looking again into this, and there are some things which in my
opinion would be difficult to achieve.
For example I looked into my patch #11 , which adds the `runqueues` into
kmemdump.
The runqueues is a variable of `struct rq` which is defined in
kernel/sched/sched.h , which is not supposed to be included outside of
sched.
Now moving all the struct definition outside of sched.h into another
public header would be rather painful and I don't think it's a really
good option (The struct would be needed to compute the sizeof inside
vmcoreinfo). Secondly, it would also imply moving all the nested struct
definitions outside as well. I doubt this is something that we want for
the sched subsys. How the subsys is designed, out of my understanding,
is to keep these internal structs opaque outside of it.
>From my perspective it's much simpler and cleaner to just add the
kmemdump annotation macro inside the sched/core.c as it's done in my
patch. This macro translates to a noop if kmemdump is not selected.
How do you see this done another way ?
>
>>
>> So I am unsure whether just removing the static and adding them into
>> header files would be more acceptable.
>>
>> Added in CC Cristoph Hellwig and Sergey Senozhatsky maybe they could
>> tell us directly whether they like or dislike this approach, as kmemdump
>> would be builtin and would not require exports.
>>
>> One other thing to mention is the fact that the printk code dynamically
>> allocates memory that would need to be registered. There is no mechanism
>> for kmemdump to know when this process has been completed (or even if it
>> was at all, because it happens on demand in certain conditions).
>
> If we are talking about memblock allocations, they sure are finished at
> the time ... the buddy is up.
>
> So it's just a matter of placing yourself late in the init stage where
> the buddy is already up and running.
>
> I assume dumping any dynamically allocated stuff through the buddy is
> out of the picture for now.
>
The dumping mechanism needs to work for dynamically allocated stuff, and
right now, it works for e.g. printk, if the buffer is dynamically
allocated later on in the boot process.
To have this working outside of printk, it would be required to walk
through all the printk structs/allocations and select the required info.
Is this something that we want to do outside of printk ? E.g. for the
printk panic-dump case, the whole dumping is done by registering a
dumper that does the job inside printk. There is no mechanism walking
through printk data in another subsystem (in my example, pstore).
So for me it is logical to register the data inside the printk.
Does this make sense ?
Powered by blists - more mailing lists