[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f62313a-438e-b65c-0a8b-f4cdadda4571@linux.microsoft.com>
Date: Tue, 8 Aug 2023 18:21:46 -0700
From: Vijay Balakrishna <vijayb@...ux.microsoft.com>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
John Ogness <john.ogness@...utronix.de>,
Kees Cook <keescook@...omium.org>,
Colin Cross <ccross@...roid.com>,
Tony Luck <tony.luck@...el.com>,
Anton Vorontsov <anton@...msg.org>,
linux-kernel@...r.kernel.org
Subject: Re: pstore/ram: printk: NULL characters in pstore ramoops area
Thanks for your reply Petr.
See inline.
On 8/8/23 01:15, Petr Mladek wrote:
> On Mon 2023-08-07 10:19:07, Vijay Balakrishna wrote:
>> I'm including my earlier email as it didn't deliver to
>> linux-kernel@...r.kernel.org due to HTML subpart. Also sharing new findings
>> --
>>
>> Limiting the size of buffer exposed to record_print_text() and
>> prb_for_each_record() in kmsg_dump_get_buffer() also resolves this issue [5]
>> -- no NULL characters in pstore/ramoops memory. The advantage is no memory
>> allocation (as done in previously shared changes [4]) which could be
>> problematic during kernel shutdown/reboot or during kexec reboot.
>>
>> [5]
>>
>> Author: Vijay Balakrishna <vijayb@...ux.microsoft.com>
>> Date: Sat Aug 5 18:47:27 2023 +0000
>>
>> printk: limit the size of buffer exposed to record_print_text() by
>> kmsg_dump_get_buffer()
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index b82e4c3b52f4..8feec932aa35 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3453,9 +3453,9 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
>> bool syslog,
>> */
>> next_seq = seq;
>>
>> - prb_rec_init_rd(&r, &info, buf, size);
>>
>> len = 0;
>> + prb_rec_init_rd(&r, &info, buf + len, (size - len) >= LOG_LINE_MAX +
>> PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>> prb_for_each_record(seq, prb, seq, &r) {
>> if (r.info->seq >= dumper->next_seq)
>> break;
>> @@ -3463,7 +3463,7 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper,
>> bool syslog,
>> len += record_print_text(&r, syslog, time);
>>
>> /* Adjust record to store to remaining buffer space. */
>> - prb_rec_init_rd(&r, &info, buf + len, size - len);
>> + prb_rec_init_rd(&r, &info, buf + len, (size - len) >=
>> LOG_LINE_MAX + PREFIX_MAX ? LOG_LINE_MAX + PREFIX_MAX : size - len);
>> }
>>
>> dumper->next_seq = next_seq;
Any comments on above change to limit buffer size/range exposed?
The buffer passed to kmsg_dump_get_buffer() is kzalloc()'ed in
fs/pstore/ram.c: ramoops_probe()
cxt->pstore.buf = kzalloc(cxt->pstore.bufsize, GFP_KERNEL);
that may explain NULL characters in buffer.
Output seen after adding NULL character check before calling
record_print_text() suggest the issue may not be in record_print_text().
(diffs below are from v5.10.178)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 17a310dcb6d9..4cbb845424f5 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3431,10 +3431,17 @@ bool kmsg_dump_get_buffer(struct kmsg_dumper
*dumper, bool syslog,
/* actually read text into the buffer now */
l = 0;
while (prb_read_valid(prb, seq, &r)) {
+ size_t pr_len;
if (r.info->seq >= dumper->next_seq)
break;
- l += record_print_text(&r, syslog, time);
+ if (null_char(r.text_buf, r.info->text_len))
+ pr_crit("%s: pre record_print_text() a NULL char
in record text_buf, l %zu, textlen %zu\n", __func__, l, r.info->text_len);
+ pr_len = record_print_text(&r, syslog, time);
+ l += pr_len;
+
+ if (null_char(r.text_buf, pr_len))
+ pr_crit("%s: post record_print_text() a NULL
char in record text_buf, l %zu, pr_len %zu\n", __func__, l, pr_len);
/* adjust record to store to remaining buffer space */
prb_rec_init_rd(&r, &info, buf + l, size - l);
(output during warm reboot)
[ 12.400817] ramoops: ramoops_pstore_write: A NULL char in record buf,
size 70648
[ 12.408217] printk: kmsg_dump_get_buffer: pre record_print_text() a
NULL char in record text_buf, l 65505, textlen 53
[ 12.408219] printk: kmsg_dump_get_buffer: post record_print_text() a
NULL char in record text_buf, l 65578, pr_len 73
As I replied to Kees last week, we will try to repro with
modern/mainline kernel hopefully soon.
>
> I looks like some problems with counting data that fit into the
> buffer. I see that several fixes were added after 5.10 release.
> I wonder if they help to solve this:
>
> commit 89ccf18f032f26946 ("printk: fix kmsg_dump_get_buffer length calulations")
> commit f0e386ee0c0b71ea6 ("printk: fix buffer overflow potential for print_text()")
> commit 08d60e59995401105 ("printk: fix string termination for record_print_text()")
Above commit changes are already in v5.10.187 kernel where we are seeing
the NULL characters in ramoops.
>
> All 3 commits were backported into 5.10 stable.
>
> The 2nd commit without the 3rd one might cause writing an extra "\0"
> into a wrong place.
>
>
>> On 8/3/23 16:34, Vijay Balakrishna wrote:
>>> Hello,
>>>
>>> We are noticing NULL characters in ramoops/pstore memory after a warm or
>>> a kexec reboot [1] in our 5.10 ARM64 product kernel after moving from
>>> 5.4 kernel. I ruled out fs/pstore/* as the source from where NULL
>>> characters originate by adding debug code [2] and confirming from
>>> collected output [3]. Then isolated further to printk log/ring buffer
>>> area, the NULL characters were already present in buffer in
>>> kmsg_dump_get_buffer() when kmsg log lines are read. After looking at
>>> printk merges in mainline kernel, I cherry-picked following which looked
>>> related to our 5.10 kernel and still see NULL characters.
>>>
>>> 4260e0e5510158d704898603331e5365ebe957de printk: consolidate
>>> kmsg_dump_get_buffer/syslog_print_all code
>>> 726b5097701a8d46f5354be780e1a11fc4ca1187 printk: refactor
>>> kmsg_dump_get_buffer()
>>> bb07b16c44b2c6ddbafa44bb06454719002e828e printk: limit second loop
>>> of syslog_print_all
> These commits tried to reduce a code duplication between kmsg_dump
> and syslog API.
>
>>> Looking at syslog_print_all() I notice it uses a local buffer unlike
>>> kmsg_dump_get_buffer() which manipulates buffer in-place to add syslog
>>> and timestamp prefix data.
> syslog_print_all() gets a buffer from userspace. It can be written
> only by copy_to_user(). It allocates an extra buffer so that it could
> do all the message formatting in the kernel space.
>
>> I made changes [4] to kmsg_dump_get_buffer()
>>> to use a local buffer similar to syslog_print_all() after which I don't
>>> see NULL characters in ramoops area. I couldn't spot any suspicious
>>> code in record_print_text() where prefix data added in-place. I'm
>>> reaching out to both pstore/ram and printk folks for comments. I can
>>> investigate/debug further with assistance and input from you.
> It is more safe with the extra buffer. It is always used only
> for one message. It is possible that the NULL character was
> also written in a wrong place there. But it did not affect
> the buffer passed to kmsg_dump_get_buffer().
>
> I hope that the above three commit fixing the length calculation
> and potential buffer overflow will fix this.
>
> Anyway, thanks a lot for debugging this and providing all the details.
Thanks,
VIjay
>
> Best Regards,
> Petr
Powered by blists - more mailing lists