lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202308040053.7F38C6D@keescook>
Date:   Fri, 4 Aug 2023 00:59:54 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Vijay Balakrishna <vijayb@...ux.microsoft.com>
Cc:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        Colin Cross <ccross@...roid.com>,
        Tony Luck <tony.luck@...el.com>,
        Anton Vorontsov <anton@...msg.org>,
        linux-kernel@...r.kernel.org
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area

On Thu, Aug 03, 2023 at 04:34:09PM -0700, Vijay Balakrishna wrote:
> Hello,
> 
> We are noticing NULL characters in ramoops/pstore memory after a warm or a
> kexec reboot [1] in our 5.10 ARM64 product kernel after moving from 5.4
> kernel.  I ruled out fs/pstore/* as the source from where NULL characters
> originate by adding debug code [2] and confirming from collected output
> [3].  Then isolated further to printk log/ring buffer area, the NULL
> characters were already present in buffer in kmsg_dump_get_buffer() when
> kmsg log lines are read.  After looking at printk merges in mainline kernel,
> I cherry-picked following which looked related to our 5.10 kernel and still
> see NULL characters.
> 
>    4260e0e5510158d704898603331e5365ebe957de printk: consolidate
>    kmsg_dump_get_buffer/syslog_print_all code
>    726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
>    kmsg_dump_get_buffer()
>    bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
>    of syslog_print_all

Do you mean that you took a working v5.4 kernel and backported the above
3 commits and it starting showing the %NUL characters?

> [...]
> diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
> index ade66dbe5f39..1825972151b2 100644
> --- a/fs/pstore/ram.c
> +++ b/fs/pstore/ram.c
> @@ -383,6 +383,10 @@ static int notrace ramoops_pstore_write(struct
> pstore_record *record)
>      size = record->size;
>      if (size + hlen > prz->buffer_size)
>          size = prz->buffer_size - hlen;
> +    if (null_char(record->buf, size))
> +        pr_crit("%s: A NULL char in record buf, size %zu\n", __func__,
> size);
> +    else
> +        pr_crit("%s: No NULL char in record buf, size %zu\n", __func__,
> size);
>      persistent_ram_write(prz, record->buf, size);
> [...]
> root@...alhost:~# reboot
> [ 2188.073362] systemd-shutdown[1]: Could not detach loopback /dev/loop1: Device or resource busy
> [ 2188.082272] systemd-shutdown[1]: Could not detach loopback /dev/loop0: Device or resource busy
> [ 2188.091873] watchdog: watchdog0: watchdog did not stop!
> [ 2188.099227] systemd-shutdown[1]: Failed to finalize loop devices, DM devices, ignoring.
> [ 2188.306671] reboot: Restarting system
> [ 2188.316932] ramoops: ramoops_pstore_write: A NULL char in record buf, size 88190

Well that does seem pretty definitive that it's a problem with the
printk/kmsg infrastructure: the %NULs are present in the buffer being
handed to pstore. :(

I have had a growing suspicion that there is a hard-to-find memory
corruption issue with recent printk work (seen during early-boot UBSAN
reporting), but v5.10 is pretty old, so it's probably not related.

Is the issue present in modern kernels?

-Kees

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ