[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a5p6g96w.fsf@jogness.linutronix.de>
Date: Mon, 15 Jan 2024 11:58:39 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt
<rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Francesco Dolcini <francesco@...cini.it>,
kernel test robot <oliver.sang@...el.com>, Sebastian Andrzej Siewior
<bigeasy@...utronix.de>
Subject: Re: [PATCH printk v3 02/14] printk: Adjust mapping for 32bit seq
macros
On 2024-01-12, Petr Mladek <pmladek@...e.com> wrote:
>> > Reported-by: Francesco Dolcini <francesco@...cini.it>
>> > Reported-by: kernel test robot <oliver.sang@...el.com>
>> > Closes: https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com
>> > Reported-by: kernel test robot <oliver.sang@...el.com>
>> > Closes: https://lore.kernel.org/oe-lkp/202311161555.3ee16fc9-oliver.sang@intel.com
>> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
>> > Signed-off-by: John Ogness <john.ogness@...utronix.de>
>>
>> Great catch! It must have been complicated to debug this.
>>
>> Reviewed-by: Petr Mladek <pmladek@...e.com>
>
> That said, I am a bit nervous that a bug like this might cause
> workqueue stall and panic() the kernel.
>
> At least, this is how I read
> https://lore.kernel.org/oe-lkp/202311171611.78d41dbe-oliver.sang@intel.com/
Note that the bug is reported for code that mainline will never
have. This patch "fixes" the problem before it is a problem. From the
perspective of mainline, the problem never existed. Perhaps it is
inappropriate to list these with the Closes tag.
> It looks like it caused some loop and refcout overlow or so.
> But I might be wrong.
>
> I would like to better understand this and check if we could prevent
> it somehow.
The code in question can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/printk/printk_ringbuffer.c?id=768c33033e41ad195a9304ebb183afb730d5ae98#n2031
The URL above points to prb_next_reserve_seq(), which is in an infinite
loop.
desc_last_finalized_seq() is returning a huge garbage value due to the
32bit sequence value 1 being incorrectly mapped to 0xffffffff00000001
(which is 18446744069414584321). When desc_read_finalized_seq() is
called (line 2082), it obviously returns -EINVAL because there is no
record with this huge sequence number.
The code interrupts -EINVAL to mean that the record has been
overwritten, and so it tries again with an updated last finalized
sequence number => infinite loop.
With this patch applied, the 32bit value 1 is correctly mapped to
sequence number 1 and all is OK.
The problem was that desc_last_finalized_seq() returned a sequence
number that was higher than the highest record. That was the bug. As
long as desc_last_finalized_seq() always returns a sequence number that
exists or used to exist, the retry loop is fine. And since the tail
record is always in the finalized state, there will always be a
finalized record available.
John
Powered by blists - more mailing lists