[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMQ5DdY41jlftEZn@pathway.suse.cz>
Date: Fri, 12 Sep 2025 17:15:25 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Daniil Tatianin <d-tatianin@...dex-team.ru>,
linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data
blocks around
On Fri 2025-09-12 16:49:53, Petr Mladek wrote:
> On Fri 2025-09-12 11:55:02, Petr Mladek wrote:
> > On Fri 2025-09-12 11:25:09, Petr Mladek wrote:
> > > On Thu 2025-09-11 18:18:32, John Ogness wrote:
> > > > On 2025-09-11, Petr Mladek <pmladek@...e.com> wrote:
> > > > > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > index 2282348e869a..241f7ef49ac6 100644
> > > > > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > > > > char text[] __counted_by(size);
> > > > > };
> > > > >
> > > > > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > > > > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
> > > >
> > > > I guess this should be:
> > > >
> > > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > >
> > > Great catch!
> > >
> > > But the KUnit test fails even with this change, see below. And I am
> > > not surprised. The test should work even with larger-than-allowed
> > > messages. prbtest_writer() should skip then because prb_reserve()
> > > should fail.
> > >
> > > Here is test result with:
> > >
> > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > > #define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
> > >
> > > DEFINE_PRINTKRB(test_rb, 4, 4);
> > >
> > > and with this patchset reverted, aka, sources from
> > > printk/linux.git, branch for-next:
> > >
> > > It is well reproducible. It always fails after reading few records.
> > > Here are results from few other runs:
> >
> > And I am not longer able to reproduce it after limiting the size
> > of the record to 1/4 of the data buffer size. I did it with
> > the following change:
> >
> > diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> > index bc811de18316..2f02254705aa 100644
> > --- a/kernel/printk/printk_ringbuffer.c
> > +++ b/kernel/printk/printk_ringbuffer.c
> > @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> > */
> > static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> > {
> > - struct prb_data_block *db = NULL;
> > -
> > if (size == 0)
> > return true;
> >
> > @@ -409,7 +407,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> > * at least the ID of the next block.
> > */
> > size = to_blk_size(size);
> > - if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> > + if (size > DATA_SIZE(data_ring) / 4)
> > return false;
> >
> > return true;
> >
> >
> > I guess that there is a race when we need to make all existing records
> > reusable when making space for the next one.
>
> It looks to me that the reader API is not ready to handle the
> situation when all records are "reusable".
>
> At least, it looks like prb_next_seq() might end up in an "infinite"
> loop because it blindly increments "seq" until finding a valid record...
>
> Honestly, I would really like to limit the maximal record size to
> 1/4 of the buffer size. I do not want to make the design more
> complicated just to be able to fill just one record, definitely.
>
>
> That said, I still a bit nervous because I do not understand why
> the KUnit test fails. It does not depend on prb_next_seq().
>
> It seems that prb_read_valid() retuns success even when
> returning a garbage. I have added some debug output using
> trace_printk():
>
> BTW2: It seems the even writers had problems to reserve space this
> time. Only few of them succeeded few times from many attempts.
> I the numbers by this change:
This actually helped. It seems that even prb_read_valid() is blindly
incrementing seq when the last entry is reusable
(desc_read_finalized_seq() return -ENOENT).
And the failed entries are entries which have not been written.
Here is another output where it is easier to see:
[ 241.991189] KTAP version 1
[ 241.991511] # Subtest: printk-ringbuffer
[ 241.991912] # module: printk_ringbuffer_kunit_test
[ 241.991941] 1..1
[ 241.993854] # test_readerwriter: running for 10000 ms
[ 241.995189] # test_readerwriter: start thread 001 (writer)
[ 241.995451] # test_readerwriter: start thread 002 (writer)
[ 241.995722] # test_readerwriter: start thread 003 (writer)
[ 241.997300] # test_readerwriter: start thread 004 (writer)
[ 241.997947] # test_readerwriter: start thread 005 (writer)
[ 241.998351] # test_readerwriter: start thread 006 (writer)
[ 241.998686] # test_readerwriter: start thread 007 (writer)
[ 241.998971] # test_readerwriter: start thread 008 (writer)
[ 242.001305] # test_readerwriter: start thread 009 (writer)
[ 242.001984] # test_readerwriter: start thread 010 (writer)
[ 242.002680] # test_readerwriter: start thread 011 (writer)
[ 242.002686] # test_readerwriter: starting test
[ 242.009239] # test_readerwriter: start reader
[ 242.009940] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=485 size=4325444 text=
[ 242.011676] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=487 size=1145324612 text=
[ 242.013569] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=490 size=4325444 text=
[ 242.015316] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=491 size=61 text=DDDDDDDDDDDDDDDDDDDBBBBBBBBBBB
[ 242.017483] # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
BAD RECORD: seq=492 size=1145324612 text=
[ 252.221105] # test_readerwriter: end reader: read=14 seq=496
[ 252.221918] # test_readerwriter: completed test
[ 252.225842] # test_readerwriter: end thread 011: wrote=0 by attempts=9760048
[ 252.226808] # test_readerwriter: end thread 010: wrote=0 by attempts=9764865
[ 252.227766] # test_readerwriter: end thread 009: wrote=0 by attempts=9768991
[ 252.228507] # test_readerwriter: end thread 008: wrote=0 by attempts=12716826
[ 252.229283] # test_readerwriter: end thread 007: wrote=0 by attempts=12674260
[ 252.230046] # test_readerwriter: end thread 006: wrote=0 by attempts=9769229
[ 252.230880] # test_readerwriter: end thread 005: wrote=0 by attempts=12716512
[ 252.231639] # test_readerwriter: end thread 004: wrote=0 by attempts=12627682
[ 252.232390] # test_readerwriter: end thread 003: wrote=364 by attempts=9766282
[ 252.233222] # test_readerwriter: end thread 002: wrote=0 by attempts=12710049
[ 252.233970] # test_readerwriter: end thread 001: wrote=123 by attempts=9786928
[ 252.234758] # test_readerwriter.speed: slow
[ 252.234792] not ok 1 test_readerwriter
Only two write threads succeed in writing few messages:
264 + 123 = 487
And the failed records have sequence numbers: 485, 487, 490, 491, 492.
I am not sure about 485 and 487. But the other three read entries
were never written.
It is possible the 485 and 487 entries were read before they were
written. I mean that the reader failed before writers failed.
I am not sure why writers failed. But I guess that they also
somehow depend on the fact that at least one oldest entry is
always finalized. I vaguely recall that we operated with this
condition at some point.
Best Regards,
Petr
Powered by blists - more mailing lists