linux-kernel - Re: pstore/ram: printk: NULL characters in pstore ramoops area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f62313a-438e-b65c-0a8b-f4cdadda4571@linux.microsoft.com>
Date:   Tue, 8 Aug 2023 18:21:46 -0700
From:   Vijay Balakrishna <vijayb@...ux.microsoft.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        John Ogness <john.ogness@...utronix.de>,
        Kees Cook <keescook@...omium.org>,
        Colin Cross <ccross@...roid.com>,
        Tony Luck <tony.luck@...el.com>,
        Anton Vorontsov <anton@...msg.org>,
        linux-kernel@...r.kernel.org
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area

Thanks for your reply Petr.

See inline.

On 8/8/23 01:15, Petr Mladek wrote:
> On Mon 2023-08-07 10:19:07, Vijay Balakrishna wrote:
>> I'm including my earlier email as it didn't deliver to
>> linux-kernel@...r.kernel.org due to HTML subpart.  Also sharing new findings
>> --
>>
>> Limiting the size of buffer exposed to record_print_text() and
>> prb_for_each_record() in kmsg_dump_get_buffer() also resolves this issue [5]
>> -- no NULL characters in pstore/ramoops memory.  The advantage is no memory
>> allocation (as done in previously shared changes [4]) which could be
>> problematic during kernel shutdown/reboot or during kexec reboot.
>>
>> [5]
>>
>> Author: Vijay Balakrishna <vijayb@...ux.microsoft.com>
>> Date:   Sat Aug 5 18:47:27 2023 +0000
>>
>>      printk: limit the size of buffer exposed to record_print_text() by
>> kmsg_dump_get_buffer()
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index b82e4c3b52f4..8feec932aa35 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3453,9 +3453,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
>> bool syslog,
>>           */
>>          next_seq = seq;
>>
>> -       prb_rec_init_rd(&r, &info, buf, size);
>>
>>          len = 0;
>> +       prb_rec_init_rd(&r, &info, buf + len, (size - len) >= LOG_LINE_MAX +
>> PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>>          prb_for_each_record(seq, prb, seq, &r) {
>>                  if (r.info->seq >= dumper->next_seq)
>>                          break;
>> @@ -3463,7 +3463,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
>> bool syslog,
>>                  len += record_print_text(&r, syslog, time);
>>
>>                  /* Adjust record to store to remaining buffer space. */
>> -               prb_rec_init_rd(&r, &info, buf + len, size - len);
>> +               prb_rec_init_rd(&r, &info, buf + len, (size - len) >=
>> LOG_LINE_MAX + PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>>          }
>>
>>          dumper->next_seq = next_seq;

Any comments on above change to limit buffer size/range exposed?

The buffer passed to kmsg_dump_get_buffer() is kzalloc()'ed in 
fs/pstore/ram.c: ramoops_probe()

                 cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);

that may explain NULL characters in buffer.

Output seen after adding NULL character check before calling 
record_print_text() suggest the issue may not be in record_print_text().

(diffs below are from v5.10.178)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 17a310dcb6d9..4cbb845424f5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3431,10 +3431,17 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper 
*dumper, bool syslog,
         /* actually read text into the buffer now */
         l = 0;
         while (prb_read_valid(prb, seq, &r)) {
+               size_t pr_len;
                 if (r.info->seq >= dumper->next_seq)
                         break;

-               l += record_print_text(&r, syslog, time);
+               if (null_char(r.text_buf, r.info->text_len))
+                       pr_crit("%s: pre record_print_text() a NULL char 
in record text_buf, l %zu, textlen %zu\n", __func__, l, r.info->text_len);
+               pr_len = record_print_text(&r, syslog, time);
+               l += pr_len;
+
+               if (null_char(r.text_buf, pr_len))
+                       pr_crit("%s: post record_print_text() a NULL 
char in record text_buf, l %zu, pr_len %zu\n", __func__, l, pr_len);

                 /* adjust record to store to remaining buffer space */
                 prb_rec_init_rd(&r, &info, buf + l, size - l);

(output during warm reboot)

[   12.400817] ramoops: ramoops_pstore_write: A NULL char in record buf, 
size 70648
[   12.408217] printk: kmsg_dump_get_buffer: pre record_print_text() a 
NULL char in record text_buf, l 65505, textlen 53
[   12.408219] printk: kmsg_dump_get_buffer: post record_print_text() a 
NULL char in record text_buf, l 65578, pr_len 73

As I replied to Kees last week, we will try to repro with 
modern/mainline kernel hopefully soon.

>
> I looks like some problems with counting data that fit into the
> buffer. I see that several fixes were added after 5.10 release.
> I wonder if they help to solve this:
>
> commit 89ccf18f032f26946 ("printk: fix kmsg_dump_get_buffer length calulations")
> commit f0e386ee0c0b71ea6 ("printk: fix buffer overflow potential for print_text()")
> commit 08d60e59995401105 ("printk: fix string termination for record_print_text()")

Above commit changes are already in v5.10.187 kernel where we are seeing 
the NULL characters in ramoops.

>
> All 3 commits were backported into 5.10 stable.
>
> The 2nd commit without the 3rd one might cause writing an extra "\0"
> into a wrong place.
>
>
>> On 8/3/23 16:34, Vijay Balakrishna wrote:
>>> Hello,
>>>
>>> We are noticing NULL characters in ramoops/pstore memory after a warm or
>>> a kexec reboot [1] in our 5.10 ARM64 product kernel after moving from
>>> 5.4 kernel.  I ruled out fs/pstore/* as the source from where NULL
>>> characters originate by adding debug code [2] and confirming from
>>> collected output [3].  Then isolated further to printk log/ring buffer
>>> area, the NULL characters were already present in buffer in
>>> kmsg_dump_get_buffer() when kmsg log lines are read.  After looking at
>>> printk merges in mainline kernel, I cherry-picked following which looked
>>> related to our 5.10 kernel and still see NULL characters.
>>>
>>>      4260e0e5510158d704898603331e5365ebe957de printk: consolidate
>>>      kmsg_dump_get_buffer/syslog_print_all code
>>>      726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
>>>      kmsg_dump_get_buffer()
>>>      bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
>>>      of syslog_print_all
> These commits tried to reduce a code duplication between kmsg_dump
> and syslog API.
>
>>> Looking at syslog_print_all() I notice it uses a local buffer unlike
>>> kmsg_dump_get_buffer() which manipulates buffer in-place to add syslog
>>> and timestamp prefix data.
> syslog_print_all() gets a buffer from userspace. It can be written
> only by copy_to_user(). It allocates an extra buffer so that it could
> do all the message formatting in the kernel space.
>
>> I made changes [4] to kmsg_dump_get_buffer()
>>> to use a local buffer similar to syslog_print_all() after which I don't
>>> see NULL characters in ramoops area.  I couldn't spot any suspicious
>>> code in record_print_text() where prefix data added in-place.  I'm
>>> reaching out to both pstore/ram and printk folks for comments. I can
>>> investigate/debug further with assistance and input from you.
> It is more safe with the extra buffer. It is always used only
> for one message. It is possible that the NULL character was
> also written in a wrong place there. But it did not affect
> the buffer passed to kmsg_dump_get_buffer().
>
> I hope that the above three commit fixing the length calculation
> and potential buffer overflow will fix this.
>
> Anyway, thanks a lot for debugging this and providing all the details.
Thanks,
VIjay
>
> Best Regards,
> Petr