[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <875x9a6cpw.fsf@jogness.linutronix.de>
Date: Fri, 09 Jan 2026 16:19:31 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Breno Leitao <leitao@...ian.org>, mpdesouza@...e.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, asantostc@...il.com,
efault@....de, gustavold@...il.com, calvin@...nvd.org, jv@...sburgh.net,
kernel-team@...a.com, Simon Horman <horms@...nel.org>, Andrew Lunn
<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
Abeni <pabeni@...hat.com>, rostedt@...dmis.org
Subject: Re: [PATCH net-next 0/2] net: netconsole: convert to NBCON console
infrastructure
On 2026-01-09, Petr Mladek <pmladek@...e.com> wrote:
> If we were adding this info, I would add it for all users, definitely.
> We are going to touch the internal printk ringbuffer format anyway.
Which could mean we are going to need a round of updating crash/dump
tools. So we should choose wisely.
> Now, the main question is how far we want to go.
>
> I see the following possibilities:
>
> A) caller_id -> pid + cpu + atomic/task context bit
> ===================================================
>
> Replace the current .caller_id and always save both PID and CPU
> number. Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..3c7635ada6dd 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,12 @@
> #include <linux/stddef.h>
> #include <linux/types.h>
>
> +
> +struct printk_caller {
> + pid_t pid; /* caller pid */
> + int cpu; /* caller CPU number */
> +};
> +
> /*
> * Meta information about each stored message.
> *
> @@ -22,8 +29,8 @@ struct printk_info {
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> - u32 caller_id; /* thread id or processor id */
>
> + struct printk_caller caller;
> struct dev_printk_info dev_info;
> };
>
> Plus the task/interrupt bit might be stored either as the highest bit
> in .pid or by adding LOG_CALLER_IN_TASK bit into "enum printk_info_flags".
The thing I find attractive about this solution is that it could be done
such that crash/dump tools must not be changed. We could leave the
semantics for @caller_id as is and simply add @cpu:
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -23,6 +23,7 @@ struct printk_info {
u8 flags:5; /* internal record flags */
u8 level:3; /* syslog level */
u32 caller_id; /* thread id or processor id */
+ u32 cpu; /* processor id */
struct dev_printk_info dev_info;
};
After all, if the caller is not in_task() then the new @pid would be
meaningless anyway.
If we are willing to accept printer-resolution of task names, then this
simple addition would be good enough for netconsole, while not requiring
any crash/dump tool updates. This would buy us time to think more
seriously about a significant overhaul.
> B) caller_id -> pid + cpu + contex
> ==================================
>
> Same as above but the caller context info is stored separately and
> might allow to distinguish more types. Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..44aa60a3f84c 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,19 @@
> #include <linux/stddef.h>
> #include <linux/types.h>
>
> +#define PRINTK_CALLER_CTXT_PREEMPT_BIT 1
> +#define PRINTK_CALLER_CTXT_SOFTIRQ_BIT 2
> +#define PRINTK_CALLER_CTXT_HARDIRQ_BIT 3
> +#define PRINTK_CALLER_CTXT_NMI 4
> +/* This is not supported on all platforms, see irqs_disabled() */
> +#define PRINTK_CALLER_CTXT_IRQS_DISABLED 5
> +
> +struct printk_caller {
> + u8 ctxt; /* caller context: task, irq, ... */
> + pid_t pid; /* caller pid */
> + int cpu; /* caller CPU number */
> +};
> +
> /*
> * Meta information about each stored message.
> *
> @@ -22,8 +35,8 @@ struct printk_info {
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> - u32 caller_id; /* thread id or processor id */
>
> + struct printk_caller caller;
> struct dev_printk_info dev_info;
> };
Just as with A, here we could also preserve @caller_id semantics
(instead of introducing @pid) to avoid crash/dump tool updates.
A new @ctxt field is only useful if it can be seen. I am not sure how
you plan on showing this. By extending the prefix like caller_id does?
Would it be a new kernel config or just use CONFIG_PRINTK_CALLER? Either
way, we are talking about extending visible log data, which is something
that needs a discussion on its own.
> C) caller_id -> pid + cpu + comm + atomic/task context bit
> ==========================================================
>
> Similar to A and B but add also
>
> char comm[TASK_COMM_LEN];
>
> into struct printk_caller.
Just as with A and B, we could preserve @caller_id semantics.
This is the simplest solution since the printer would not need to
perform any resolution at all and it would be 100% accurate (when
in_task).
> D) caller_id -> pid + cpu + comm + atomic/task context bit
> and move printk_caller + printk_dev_info into text buffer
> as suggested as mentioned at
> https://lore.kernel.org/lkml/84y10vz7ty.fsf@jogness.linutronix.de
> ====================================================================
>
> There are many possibilities how to do it. And it involves two
> problems:
>
> a) how to store the data into data buffer
> b) how to integrate this in the ring buffer API
>
> I thought about several approaches and realized that it still
> would make sense to keep:
>
> + binary data (pid, cpu, ctxt) in desc ring
> + text data (comm, subsystem, device) in text_data buffer
>
> Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..41232bf2919d 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,27 @@
> #include <linux/stddef.h>
> #include <linux/types.h>
>
> +#define PRINTK_CALLER_CTXT_PREEMPT_BIT 0
> +#define PRINTK_CALLER_CTXT_SOFTIRQ_BIT 1
> +#define PRINTK_CALLER_CTXT_HARDIRQ_BIT 2
> +#define PRINTK_CALLER_CTXT_NMI 3
> +/* This is not supported on all platforms, see irqs_disabled() */
> +#define PRINTK_CALLER_CTXT_IRQS_DISABLED 4
> +
> +struct printk_caller {
> + u8 ctxt; /* caller context: task, irq, ... */
> + pid_t pid; /* caller pid */
> + int cpu; /* caller CPU number */
> +};
> +
> +/*
> + * Describe which text information about the caller
> + * is stored in text_data_ring
> + */
> +#define PRINTK_CALLER_BITS_COMM_BIT 0
> +#define PRINTK_CALLER_BITS_SUBSYSTEM_BIT 1
> +#define PRINTK_CALLER_BITS_DEVICE_BIT 2
> +
> /*
> * Meta information about each stored message.
> *
> @@ -22,9 +43,8 @@ struct printk_info {
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> - u32 caller_id; /* thread id or processor id */
> -
> - struct dev_printk_info dev_info;
> + u8 caller_bits_size; /* size of the caller info in data buffer */
> + u8 caller_bits; /* bits describing caller in data buffer */
> };
>
> /*
>
> The text would be stored in the test_data_ring in the following
> format:
>
> [<comm>\0][<subsystem>\0][<device\0]<text>\0
I would prefer:
<text>\0[<subsystem>\0][<device>\0][<comm>\0]
Crash/dump tools that are not updated would at least continue to print
the text at the beginning of the line. Here are a few projects that I
track and how they would react (unmodified):
makedumpfile
- https://github.com/makedumpfile/makedumpfile/blob/master/printk.c#L134
- will print "\x00" between the fields
vmcore-dmesg
- https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/util_lib/elf_info.c?h=main#n904
- will print "\x00" between the fields
crash
- https://github.com/crash-utility/crash/blob/master/printk.c#L210
- will print "." between the fields
> aka, strings separated by '\0'. The information about the caller
> would be optional where:
>
> + .caller_bits_size would contain the size of the optional
> pieces, including the trailing '\0's.
I do not know if we would need this. @text_len could include the caller
area as well.
> + .caller_bits number would define which particular pieces
> are there. It would be even somehow extensible. The crash
> tool could ignore parts which are not supported.
Yes, this works well.
> My opinion:
> ===========
>
> I personally think that the approach B) is the best compromise for now
> because:
>
> 1. We really should distinguish task/interrupt context. IMHO, it is
> a very useful information provided by the caller_id now.
>
> The approach A) stores the context a hacky way. The approach B)
> is cleaner and provides even more precise info.
>
>
> 2. The mapping between PID and COMM might get lost if we do not store it.
> But it should not be problem most of the time because we try
> to flush consoles ASAP.
>
> I would keep it simple for now. We could add it later when it
> becomes a problem in practice.
>
> BTW: I think that we could detect when the mapping is valid
> by comparing task->start_time with current time...
>
> Devil advocate: Adding comm[TASK_COMM_LEN] into struct printk_info
> might be acceptable. It is "just" 16 bytes in compare
> with 64 bytes for dev_printk.
I am OK with A, B, or C if we can keep the @caller_id semantics. This
would not require any changes to the LOG_CONT implementation or any
other exists buffer preparation code and it would not require and
changes do crash/dump tools. I.e. it would simply be making new fields
available for netconsole.
I can accept that we want to avoid C for now until we solve the
efficient space issue.
(My official preference list is below...)
> 3. It would be nice to optimize the memory usage and store the
> optional and variable strings (comm, subsystem, device) into
> data buffer. But it looks like a non-trivial work.
>
> I would do this only when there is a real demand. And I haven't
> heard about that the current approach was not acceptable yet.
Well, the whole reason the topic came up is because of a complaint from
Geert. And I think if people knew just how much space was being wasted,
they might speak up. For example, on my Debian distro kernel, I have a
256KB text_data_ring and an 896KB desc_ring. I expect that the total
usage would cut in half if we packed the dev_printk_info data into the
text_data_ring. I can prototype some tests (just to satisfy my
curiousity).
I do not think it would be much work. Whether the strings are copied
from text_data arrays or dev_printk_info structs does not make much
difference. There is only a slightly larger overhead to identify their
existance and offsets. (Currently it is a single memcpy() of
dev_printk_info.)
But we would need to update the crash/dump tools. So I am not in a rush
to implement D right now.
My ordered preferences for right now would be:
1. keeping @caller_id semantics + adding @cpu + adding @comm (similar to
your C)
2. keeping @caller_id semantics + adding @cpu (similar to your A)
These additions would be purely to support netconsole and the additions
would be under CONFIG_NETCONSOLE_DYNAMIC. No crash/dump tools should be
changed due to this.
For the long term I would like to move all strings into the
text_data_ring. We could use that opportunity to add context bits, time
stamps, etc. and then appropriately update the major crash/dump tools.
John
Powered by blists - more mailing lists