[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202308040053.7F38C6D@keescook>
Date: Fri, 4 Aug 2023 00:59:54 -0700
From: Kees Cook <keescook@...omium.org>
To: Vijay Balakrishna <vijayb@...ux.microsoft.com>
Cc: Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
John Ogness <john.ogness@...utronix.de>,
Colin Cross <ccross@...roid.com>,
Tony Luck <tony.luck@...el.com>,
Anton Vorontsov <anton@...msg.org>,
linux-kernel@...r.kernel.org
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area
On Thu, Aug 03, 2023 at 04:34:09PM -0700, Vijay Balakrishna wrote:
> Hello,
>
> We are noticing NULL characters in ramoops/pstore memory after a warm or a
> kexec reboot [1] in our 5.10 ARM64 product kernel after moving from 5.4
> kernel. I ruled out fs/pstore/* as the source from where NULL characters
> originate by adding debug code [2] and confirming from collected output
> [3]. Then isolated further to printk log/ring buffer area, the NULL
> characters were already present in buffer in kmsg_dump_get_buffer() when
> kmsg log lines are read. After looking at printk merges in mainline kernel,
> I cherry-picked following which looked related to our 5.10 kernel and still
> see NULL characters.
>
> 4260e0e5510158d704898603331e5365ebe957de printk: consolidate
> kmsg_dump_get_buffer/syslog_print_all code
> 726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
> kmsg_dump_get_buffer()
> bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
> of syslog_print_all
Do you mean that you took a working v5.4 kernel and backported the above
3 commits and it starting showing the %NUL characters?
> [...]
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index ade66dbe5f39..1825972151b2 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -383,6 +383,10 @@ static int notrace ramoops_pstore_write(struct
> pstore_record *record)
> size = record->size;
> if (size + hlen > prz->buffer_size)
> size = prz->buffer_size - hlen;
> + if (null_char(record->buf, size))
> + pr_crit("%s: A NULL char in record buf, size %zu\n", __func__,
> size);
> + else
> + pr_crit("%s: No NULL char in record buf, size %zu\n", __func__,
> size);
> persistent_ram_write(prz, record->buf, size);
> [...]
> root@...alhost:~# reboot
> [ 2188.073362] systemd-shutdown[1]: Could not detach loopback /dev/loop1: Device or resource busy
> [ 2188.082272] systemd-shutdown[1]: Could not detach loopback /dev/loop0: Device or resource busy
> [ 2188.091873] watchdog: watchdog0: watchdog did not stop!
> [ 2188.099227] systemd-shutdown[1]: Failed to finalize loop devices, DM devices, ignoring.
> [ 2188.306671] reboot: Restarting system
> [ 2188.316932] ramoops: ramoops_pstore_write: A NULL char in record buf, size 88190
Well that does seem pretty definitive that it's a problem with the
printk/kmsg infrastructure: the %NULs are present in the buffer being
handed to pstore. :(
I have had a growing suspicion that there is a hard-to-find memory
corruption issue with recent printk work (seen during early-boot UBSAN
reporting), but v5.10 is pretty old, so it's probably not related.
Is the issue present in modern kernels?
-Kees
--
Kees Cook
Powered by blists - more mailing lists