linux-kernel - Re: [PATCH net-next 0/2] net: netconsole: convert to NBCON console infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875x9a6cpw.fsf@jogness.linutronix.de>
Date: Fri, 09 Jan 2026 16:19:31 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Breno Leitao <leitao@...ian.org>, mpdesouza@...e.com,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, asantostc@...il.com,
 efault@....de, gustavold@...il.com, calvin@...nvd.org, jv@...sburgh.net,
 kernel-team@...a.com, Simon Horman <horms@...nel.org>, Andrew Lunn
 <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
 Abeni <pabeni@...hat.com>, rostedt@...dmis.org
Subject: Re: [PATCH net-next 0/2] net: netconsole: convert to NBCON console
 infrastructure

On 2026-01-09, Petr Mladek <pmladek@...e.com> wrote:
> If we were adding this info, I would add it for all users, definitely.
> We are going to touch the internal printk ringbuffer format anyway.

Which could mean we are going to need a round of updating crash/dump
tools. So we should choose wisely.

> Now, the main question is how far we want to go.
>
> I see the following possibilities:
>
> A) caller_id -> pid + cpu + atomic/task context bit
> ===================================================
>
> Replace the current .caller_id and always save both PID and CPU
> number. Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..3c7635ada6dd 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,12 @@
>  #include <linux/stddef.h>
>  #include <linux/types.h>
>  
> +
> +struct printk_caller {
> +	pid_t	pid;	/* caller pid */
> +	int	cpu;	/* caller CPU number */
> +};
> +
>  /*
>   * Meta information about each stored message.
>   *
> @@ -22,8 +29,8 @@ struct printk_info {
>  	u8	facility;	/* syslog facility */
>  	u8	flags:5;	/* internal record flags */
>  	u8	level:3;	/* syslog level */
> -	u32	caller_id;	/* thread id or processor id */
>  
> +	struct printk_caller caller;
>  	struct dev_printk_info	dev_info;
>  };
>
> Plus the task/interrupt bit might be stored either as the highest bit
> in .pid or by adding LOG_CALLER_IN_TASK bit into "enum printk_info_flags".

The thing I find attractive about this solution is that it could be done
such that crash/dump tools must not be changed. We could leave the
semantics for @caller_id as is and simply add @cpu:

--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -23,6 +23,7 @@ struct printk_info {
 	u8	flags:5;	/* internal record flags */
 	u8	level:3;	/* syslog level */
 	u32	caller_id;	/* thread id or processor id */
+	u32	cpu;	/* processor id */
 
 	struct dev_printk_info	dev_info;
 };

After all, if the caller is not in_task() then the new @pid would be
meaningless anyway.

If we are willing to accept printer-resolution of task names, then this
simple addition would be good enough for netconsole, while not requiring
any crash/dump tool updates. This would buy us time to think more
seriously about a significant overhaul.

> B) caller_id -> pid + cpu + contex
> ==================================
>
> Same as above but the caller context info is stored separately and
> might allow to distinguish more types. Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..44aa60a3f84c 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,19 @@
>  #include <linux/stddef.h>
>  #include <linux/types.h>
>  
> +#define PRINTK_CALLER_CTXT_PREEMPT_BIT		1
> +#define PRINTK_CALLER_CTXT_SOFTIRQ_BIT		2
> +#define PRINTK_CALLER_CTXT_HARDIRQ_BIT		3
> +#define PRINTK_CALLER_CTXT_NMI			4
> +/* This is not supported on all platforms, see irqs_disabled() */
> +#define PRINTK_CALLER_CTXT_IRQS_DISABLED	5
> +
> +struct printk_caller {
> +	u8	ctxt;	/* caller context: task, irq, ... */
> +	pid_t	pid;	/* caller pid */
> +	int	cpu;	/* caller CPU number */
> +};
> +
>  /*
>   * Meta information about each stored message.
>   *
> @@ -22,8 +35,8 @@ struct printk_info {
>  	u8	facility;	/* syslog facility */
>  	u8	flags:5;	/* internal record flags */
>  	u8	level:3;	/* syslog level */
> -	u32	caller_id;	/* thread id or processor id */
>  
> +	struct printk_caller caller;
>  	struct dev_printk_info	dev_info;
>  };

Just as with A, here we could also preserve @caller_id semantics
(instead of introducing @pid) to avoid crash/dump tool updates.

A new @ctxt field is only useful if it can be seen. I am not sure how
you plan on showing this. By extending the prefix like caller_id does?
Would it be a new kernel config or just use CONFIG_PRINTK_CALLER? Either
way, we are talking about extending visible log data, which is something
that needs a discussion on its own.

> C) caller_id -> pid + cpu + comm + atomic/task context bit
> ==========================================================
>
> Similar to A and B but add also
>
> 	char comm[TASK_COMM_LEN];
>
> into struct printk_caller.

Just as with A and B, we could preserve @caller_id semantics.

This is the simplest solution since the printer would not need to
perform any resolution at all and it would be 100% accurate (when
in_task).

> D) caller_id -> pid + cpu + comm + atomic/task context bit
>    and move printk_caller + printk_dev_info into text buffer
>    as suggested as mentioned at
>    https://lore.kernel.org/lkml/84y10vz7ty.fsf@jogness.linutronix.de
> ====================================================================
>
>    There are many possibilities how to do it. And it involves two
>    problems:
>
>       a) how to store the data into data buffer
>       b) how to integrate this in the ring buffer API
>
>     I thought about several approaches and realized that it still
>     would make sense to keep:
>
>        + binary data (pid, cpu, ctxt) in desc ring
>        + text data (comm, subsystem, device) in text_data buffer
>
>     Something like:
>
> diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
> index 4ef81349d9fb..41232bf2919d 100644
> --- a/kernel/printk/printk_ringbuffer.h
> +++ b/kernel/printk/printk_ringbuffer.h
> @@ -9,6 +9,27 @@
>  #include <linux/stddef.h>
>  #include <linux/types.h>
>  
> +#define PRINTK_CALLER_CTXT_PREEMPT_BIT		0
> +#define PRINTK_CALLER_CTXT_SOFTIRQ_BIT		1
> +#define PRINTK_CALLER_CTXT_HARDIRQ_BIT		2
> +#define PRINTK_CALLER_CTXT_NMI			3
> +/* This is not supported on all platforms, see irqs_disabled() */
> +#define PRINTK_CALLER_CTXT_IRQS_DISABLED	4
> +
> +struct printk_caller {
> +	u8	ctxt;	/* caller context: task, irq, ... */
> +	pid_t	pid;	/* caller pid */
> +	int	cpu;	/* caller CPU number */
> +};
> +
> +/*
> + * Describe which text information about the caller
> + * is stored in text_data_ring
> + */
> +#define PRINTK_CALLER_BITS_COMM_BIT		0
> +#define PRINTK_CALLER_BITS_SUBSYSTEM_BIT	1
> +#define PRINTK_CALLER_BITS_DEVICE_BIT		2
> +
>  /*
>   * Meta information about each stored message.
>   *
> @@ -22,9 +43,8 @@ struct printk_info {
>  	u8	facility;	/* syslog facility */
>  	u8	flags:5;	/* internal record flags */
>  	u8	level:3;	/* syslog level */
> -	u32	caller_id;	/* thread id or processor id */
> -
> -	struct dev_printk_info	dev_info;
> +	u8	caller_bits_size; /* size of the caller info in data buffer */
> +	u8	caller_bits;	/* bits describing caller in data buffer */
>  };
>  
>  /*
>
> The text would be stored in the test_data_ring in the following
> format:
>
> 	[<comm>\0][<subsystem>\0][<device\0]<text>\0

I would prefer:

<text>\0[<subsystem>\0][<device>\0][<comm>\0]

Crash/dump tools that are not updated would at least continue to print
the text at the beginning of the line. Here are a few projects that I
track and how they would react (unmodified):

makedumpfile
- https://github.com/makedumpfile/makedumpfile/blob/master/printk.c#L134
- will print "\x00" between the fields

vmcore-dmesg
- https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/tree/util_lib/elf_info.c?h=main#n904
- will print "\x00" between the fields

crash
- https://github.com/crash-utility/crash/blob/master/printk.c#L210
- will print "." between the fields

> aka, strings separated by '\0'. The information about the caller
> would be optional where:
>
>    + .caller_bits_size would contain the size of the optional
>      pieces, including the trailing '\0's.

I do not know if we would need this. @text_len could include the caller
area as well.

>    + .caller_bits number would define which particular pieces
>      are there. It would be even somehow extensible. The crash
>      tool could ignore parts which are not supported.

Yes, this works well.

> My opinion:
> ===========
>
> I personally think that the approach B) is the best compromise for now
> because:
>
> 1. We really should distinguish task/interrupt context. IMHO, it is
>    a very useful information provided by the caller_id now.
>
>    The approach A) stores the context a hacky way. The approach B)
>    is cleaner and provides even more precise info.
>
>
> 2. The mapping between PID and COMM might get lost if we do not store it.
>    But it should not be problem most of the time because we try
>    to flush consoles ASAP.
>
>    I would keep it simple for now. We could add it later when it
>    becomes a problem in practice.
>
>    BTW: I think that we could detect when the mapping is valid
> 	by comparing task->start_time with current time...
>
>    Devil advocate: Adding comm[TASK_COMM_LEN] into struct printk_info
> 		might be acceptable. It is "just" 16 bytes in compare
> 		with 64 bytes for dev_printk.

I am OK with A, B, or C if we can keep the @caller_id semantics. This
would not require any changes to the LOG_CONT implementation or any
other exists buffer preparation code and it would not require and
changes do crash/dump tools. I.e. it would simply be making new fields
available for netconsole.

I can accept that we want to avoid C for now until we solve the
efficient space issue.

(My official preference list is below...)

> 3. It would be nice to optimize the memory usage and store the
>    optional and variable strings (comm, subsystem, device) into
>    data buffer. But it looks like a non-trivial work.
>
>    I would do this only when there is a real demand. And I haven't
>    heard about that the current approach was not acceptable yet.

Well, the whole reason the topic came up is because of a complaint from
Geert. And I think if people knew just how much space was being wasted,
they might speak up. For example, on my Debian distro kernel, I have a
256KB text_data_ring and an 896KB desc_ring. I expect that the total
usage would cut in half if we packed the dev_printk_info data into the
text_data_ring. I can prototype some tests (just to satisfy my
curiousity).

I do not think it would be much work. Whether the strings are copied
from text_data arrays or dev_printk_info structs does not make much
difference. There is only a slightly larger overhead to identify their
existance and offsets. (Currently it is a single memcpy() of
dev_printk_info.)

But we would need to update the crash/dump tools. So I am not in a rush
to implement D right now.

My ordered preferences for right now would be:

1. keeping @caller_id semantics + adding @cpu + adding @comm (similar to
your C)

2. keeping @caller_id semantics + adding @cpu (similar to your A)

These additions would be purely to support netconsole and the additions
would be under CONFIG_NETCONSOLE_DYNAMIC. No crash/dump tools should be
changed due to this.

For the long term I would like to move all strings into the
text_data_ring. We could use that opportunity to add context bits, time
stamps, etc. and then appropriately update the major crash/dump tools.

John