[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzVlYaUPcRmlfE7c@wendy>
Date: Thu, 29 Sep 2022 10:29:05 +0100
From: Conor Dooley <conor.dooley@...rochip.com>
To: Thorsten Leemhuis <regressions@...mhuis.info>
CC: Conor Dooley <conor@...nel.org>, <pmladek@...e.com>,
<senozhatsky@...omium.org>, <rostedt@...dmis.org>,
<john.ogness@...utronix.de>, <linux-kernel@...r.kernel.org>,
<regressions@...ts.linux.dev>
Subject: Re: [resend][bug] low-probability console lockups since 5.19
On Thu, Sep 29, 2022 at 11:06:01AM +0200, Thorsten Leemhuis wrote:
> Hi Conor
>
> On 28.09.22 18:55, Conor Dooley wrote:
> > On Fri, Sep 23, 2022 at 05:24:17PM +0100, Conor Dooley wrote:
> >>
> >> Been bisecting a bug that is causing a boot failure in my CI & have
> >> ended up here.. The bug in question is a low(ish) probability lock up
> >> of the serial console, I would estimate about 1-in-5 chance on the
> >> boards I could actually trigger it on which it has taken me so long
> >> to realise that this was an actual problem. Thinking back on it, there
> >> were other failures that I would retroactively attribute to this
> >> problem too, but I had earlycon disabled
> > [...]
> > #regzbot introduced: 5831788afb17b89c5b531fb60cbd798613ccbb63 ^
> > Hopefully I did this correctly...
>
> Yes, you did, thx for this. I already had been watching this thread
> manually and was a bit unsure what to do with it.
Great, thanks.
>
> > I picked that commit as that's where things start going haywire.
>
> There is one thing I wonder when skimming this thread: was there maybe
> some other change somewhere in the kernel between the introduction and
> the revert of the printk console kthreads patches that is the real
> culprit here that makes existing, older races easier to hit? But I guess
> in the end that would be very hard to find and it's easier to fix the
> problem in the console driver... :-/
Entirely possible that something arrived in the middle, yeah. I've done
100s of reboots on that interim section, albeit with the threaded
printers enabled, as I restarted the bisection several times & never hit
this failure then.
I don't know anything about console/printk/serial drivers unfortunately
so I will almost certainly not be able to find the problem by
inspection. I'd rather submit patches than send reports, but I really
really need some help here. I looked at the two patterns Petr suggested,
but the former I am not sure applies since the issue is present even
when earlycon is disabled & the latter appears (to my untrained eye) to
be accounted for in the 8250 driver.
Thanks,
Conor.
Powered by blists - more mailing lists