linux-kernel - Re: [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data blocks around

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMQ5DdY41jlftEZn@pathway.suse.cz>
Date: Fri, 12 Sep 2025 17:15:25 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Daniil Tatianin <d-tatianin@...dex-team.ru>,
	linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
	Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data
 blocks around

On Fri 2025-09-12 16:49:53, Petr Mladek wrote:
> On Fri 2025-09-12 11:55:02, Petr Mladek wrote:
> > On Fri 2025-09-12 11:25:09, Petr Mladek wrote:
> > > On Thu 2025-09-11 18:18:32, John Ogness wrote:
> > > > On 2025-09-11, Petr Mladek <pmladek@...e.com> wrote:
> > > > > diff --git a/kernel/printk/printk_ringbuffer_kunit_test.c b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > index 2282348e869a..241f7ef49ac6 100644
> > > > > --- a/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > +++ b/kernel/printk/printk_ringbuffer_kunit_test.c
> > > > > @@ -56,7 +56,7 @@ struct prbtest_rbdata {
> > > > >  	char text[] __counted_by(size);
> > > > >  };
> > > > >  
> > > > > -#define MAX_RBDATA_TEXT_SIZE 0x80
> > > > > +#define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
> > > > 
> > > > I guess this should be:
> > > > 
> > > > #define MAX_RBDATA_TEXT_SIZE (256  - sizeof(struct prbtest_rbdata))
> > > 
> > > Great catch!
> > > 
> > > But the KUnit test fails even with this change, see below. And I am
> > > not surprised. The test should work even with larger-than-allowed
> > > messages. prbtest_writer() should skip then because prb_reserve()
> > > should fail.
> > > 
> > > Here is test result with:
> > > 
> > > #define MAX_RBDATA_TEXT_SIZE (256 - sizeof(struct prbtest_rbdata))
> > > #define MAX_PRB_RECORD_SIZE (sizeof(struct prbtest_rbdata) + MAX_RBDATA_TEXT_SIZE)
> > > 
> > > DEFINE_PRINTKRB(test_rb, 4, 4);
> > > 
> > > and with this patchset reverted, aka, sources from
> > > printk/linux.git, branch for-next:
> > > 
> > > It is well reproducible. It always fails after reading few records.
> > > Here are results from few other runs:
> > 
> > And I am not longer able to reproduce it after limiting the size
> > of the record to 1/4 of the data buffer size. I did it with
> > the following change:
> > 
> > diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
> > index bc811de18316..2f02254705aa 100644
> > --- a/kernel/printk/printk_ringbuffer.c
> > +++ b/kernel/printk/printk_ringbuffer.c
> > @@ -398,8 +398,6 @@ static unsigned int to_blk_size(unsigned int size)
> >   */
> >  static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> >  {
> > -	struct prb_data_block *db = NULL;
> > -
> >  	if (size == 0)
> >  		return true;
> >  
> > @@ -409,7 +407,7 @@ static bool data_check_size(struct prb_data_ring *data_ring, unsigned int size)
> >  	 * at least the ID of the next block.
> >  	 */
> >  	size = to_blk_size(size);
> > -	if (size > DATA_SIZE(data_ring) - sizeof(db->id))
> > +	if (size > DATA_SIZE(data_ring) / 4)
> >  		return false;
> >  
> >  	return true;
> > 
> > 
> > I guess that there is a race when we need to make all existing records
> > reusable when making space for the next one.
> 
> It looks to me that the reader API is not ready to handle the
> situation when all records are "reusable".
> 
> At least, it looks like prb_next_seq() might end up in an "infinite"
> loop because it blindly increments "seq" until finding a valid record...
> 
> Honestly, I would really like to limit the maximal record size to
> 1/4 of the buffer size. I do not want to make the design more
> complicated just to be able to fill just one record, definitely.
> 
> 
> That said, I still a bit nervous because I do not understand why
> the KUnit test fails. It does not depend on prb_next_seq().
> 
> It seems that prb_read_valid() retuns success even when
> returning a garbage. I have added some debug output using
> trace_printk():
> 
> BTW2: It seems the even writers had problems to reserve space this
>       time. Only few of them succeeded few times from many attempts.
>       I the numbers by this change:

This actually helped. It seems that even prb_read_valid() is blindly
incrementing seq when the last entry is reusable
(desc_read_finalized_seq() return -ENOENT).

And the failed entries are entries which have not been written.
Here is another output where it is easier to see:

[  241.991189]     KTAP version 1
[  241.991511]     # Subtest: printk-ringbuffer
[  241.991912]     # module: printk_ringbuffer_kunit_test
[  241.991941]     1..1
[  241.993854]     # test_readerwriter: running for 10000 ms
[  241.995189]     # test_readerwriter: start thread 001 (writer)
[  241.995451]     # test_readerwriter: start thread 002 (writer)
[  241.995722]     # test_readerwriter: start thread 003 (writer)
[  241.997300]     # test_readerwriter: start thread 004 (writer)
[  241.997947]     # test_readerwriter: start thread 005 (writer)
[  241.998351]     # test_readerwriter: start thread 006 (writer)
[  241.998686]     # test_readerwriter: start thread 007 (writer)
[  241.998971]     # test_readerwriter: start thread 008 (writer)
[  242.001305]     # test_readerwriter: start thread 009 (writer)
[  242.001984]     # test_readerwriter: start thread 010 (writer)
[  242.002680]     # test_readerwriter: start thread 011 (writer)
[  242.002686]     # test_readerwriter: starting test
[  242.009239]     # test_readerwriter: start reader
[  242.009940]     # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
               BAD RECORD: seq=485 size=4325444 text=
[  242.011676]     # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
               BAD RECORD: seq=487 size=1145324612 text=
[  242.013569]     # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
               BAD RECORD: seq=490 size=4325444 text=
[  242.015316]     # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
               BAD RECORD: seq=491 size=61 text=DDDDDDDDDDDDDDDDDDDBBBBBBBBBBB
[  242.017483]     # test_readerwriter: EXPECTATION FAILED at kernel/printk/printk_ringbuffer_kunit_test.c:80
               BAD RECORD: seq=492 size=1145324612 text=
[  252.221105]     # test_readerwriter: end reader: read=14 seq=496
[  252.221918]     # test_readerwriter: completed test
[  252.225842]     # test_readerwriter: end thread 011: wrote=0 by attempts=9760048
[  252.226808]     # test_readerwriter: end thread 010: wrote=0 by attempts=9764865
[  252.227766]     # test_readerwriter: end thread 009: wrote=0 by attempts=9768991
[  252.228507]     # test_readerwriter: end thread 008: wrote=0 by attempts=12716826
[  252.229283]     # test_readerwriter: end thread 007: wrote=0 by attempts=12674260
[  252.230046]     # test_readerwriter: end thread 006: wrote=0 by attempts=9769229
[  252.230880]     # test_readerwriter: end thread 005: wrote=0 by attempts=12716512
[  252.231639]     # test_readerwriter: end thread 004: wrote=0 by attempts=12627682
[  252.232390]     # test_readerwriter: end thread 003: wrote=364 by attempts=9766282
[  252.233222]     # test_readerwriter: end thread 002: wrote=0 by attempts=12710049
[  252.233970]     # test_readerwriter: end thread 001: wrote=123 by attempts=9786928
[  252.234758]     # test_readerwriter.speed: slow
[  252.234792]     not ok 1 test_readerwriter

Only two write threads succeed in writing few messages:

    264 + 123 = 487

And the failed records have sequence numbers: 485, 487, 490, 491, 492.

I am not sure about 485 and 487. But the other three read entries
were never written.

It is possible the 485 and 487 entries were read before they were
written. I mean that the reader failed before writers failed.

I am not sure why writers failed. But I guess that they also
somehow depend on the fact that at least one oldest entry is
always finalized. I vaguely recall that we operated with this
condition at some point.

Best Regards,
Petr