[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMgr0dId_UfBptzW@pathway.suse.cz>
Date: Mon, 15 Sep 2025 17:08:01 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Daniil Tatianin <d-tatianin@...dex-team.ru>,
linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data
blocks around
On Fri 2025-09-12 20:49:37, John Ogness wrote:
> Hi Petr,
>
> Summary: printk() is not in danger but we should correct a loose bounds
> check.
>
> On 2025-09-12, Petr Mladek <pmladek@...e.com> wrote:
> > Honestly, I would really like to limit the maximal record size to
> > 1/4 of the buffer size. I do not want to make the design more
> > complicated just to be able to fill just one record, definitely.
>
> So I was able to track this down. Your usage of
>
> DEFINE_PRINTKRB(test_rb, 4, 4);
>
> actually made it relatively easy because there are only 16
> descriptors. All I needed to do was dump the descriptors before each
> reserve, between reserve and commit, after commit, and when reserve
> fails. This allowed me to easily see exactly how the ringbuffer is
> behaving.
>
> The problem can be reproduced with a single writer, no reader
> needed. Using
>
> #define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
>
> provides a wild range of attempts that trigger the problem within about
> 20 write cycles.
>
> The problem comes from the function data_make_reusable(). The job of
> this function is to push the data_ring tail forward, one data block at a
> time, while setting the related descriptors to reusable.
>
> After pushing the tail forward, if it still has not pushed it far enough
> for new requested reservation, it must push it further. For this it
> _assumes the current position of the tail is a descriptor ID for the
> next data block_. But what if the tail was pushed all the way to the
> head? Then there is no next data block and it will read in garbage,
> thinking it is the next descriptor ID to set reusable. And from there it
> just goes crazy because it is reading garbage to determine how big the
> data block is so that it can continue pushing the tail (beyond the head!).
>
> Example: Assume the 96 byte ringbuffer has a single message of 64
> bytes. Then we try to reserve space for a 72-byte
> message. data_make_reusable() will first set the descriptor of the
> 64-byte message to reusable and push the tail forward to index 64. But
> the new message needs 72 bytes, so data_make_reusable() will keep going
> and read the descriptor ID at index 64, but there is only random garbage
> at that position. 64 is the head and there is nothing valid after it.
Great catch and example!
I wondered why data_make_reusable() needed to push the tail that far.
The buffer was empty after making the 64 bytes long message free.
My understanding is that it is combination of the following effects:
1. The message is wrapped.
2. The ring buffer does not support proper wrapping. Instead,
the non-sufficient space at the end of the buffer stays
unused (last wrap). And the messages will be written
from the beginning of the buffer (next wrap).
=> the message will occupy more space than expected
unused space from last wrap + full message size in new wrap
In our case:
+ size of the buffer: 96
+ unused space in old wrap: 96 - 64 = 32
+ occupied space in new wrap: 72
=> total occupied space: = 32 + 72 = 104 > 96
=> lpos passed to data_push_tail() is from a never used space
=> This is why data_push_tail() tries to read
descriptor from a never used space and reads a garbage
> This situation can never happen for printk because of your 1/4 limit
> (MAX_LOG_TAKE_PART), although it is over-conservative.
I would say that it is conservative. It would survive mistakes from the
off-by-one family, ... ;-)
And it is still far from practical limits. Because having this
powerful ring buffer for 1, 2, or 4 messages looks line an overkill.
> It is enough to limit messages to 1/2 of the data ring
> (with Daniil's series). Otherwise the limit must be
> "1/2 - sizeof(long)" to also leave room for the
> trailing ID of a wrapping data block.
I am not sure why it is important to push it to the limits.
That said, I could live with it. Especially how, when we
understood what happened.
> I am still positive about Daniil's series.
Yes, the patch which prevents wrapping for perfectly fitting messages
looks good to me.
> And we should fix
> data_check_size() to be provide a proper limit as well as describe the
> critical relationship between data_check_size() and
> data_make_reusable().
Yup.
> I prefer not modify data_make_reusable() to handle this case. Currently
> data_make_reusable() does nothing with the head, so it would introduce
> new memory barriers. Also, the "push tail beyond head" scenario is a bit
> odd to handle. It is better just to document the assumption and put in
> the correct bounds checks.
It might be possible to catch this in either in data_alloc().
or in get_next_lpos(). They could ignore/yell about when
the really occupied space would be bigger than DATA_SIZE(data_ring).
Something like:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 17b741b2eccd..d7ba4c0d8c3b 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1056,8 +1056,16 @@ static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size,
do {
next_lpos = get_next_lpos(data_ring, begin_lpos, size);
- if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
- /* Failed to allocate, specify a data-less block. */
+ /*
+ * Double check that the really used space won't be bigger than
+ * the ring buffer. Wrapped messages need to reserve more space,
+ * see get_next_lpos.
+ *
+ * Specify a data-less block when the check or the allocation
+ * fails.
+ */
+ if (WARN_ON_ONCE(next_lpos - begin_lpos > DATA_SIZE(data_ring)) ||
+ !data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
blk_lpos->begin = FAILED_LPOS;
blk_lpos->next = FAILED_LPOS;
return NULL;
Similar check would need to be done also in data_realloc().
I am not sure if it is worth it. Maybe, we could rule this out
when we limit the allocated size to 1/2 or 1/4 of the ring buffer size.
Best Regards,
Petr
Powered by blists - more mailing lists