lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzVlYaUPcRmlfE7c@wendy>
Date:   Thu, 29 Sep 2022 10:29:05 +0100
From:   Conor Dooley <conor.dooley@...rochip.com>
To:     Thorsten Leemhuis <regressions@...mhuis.info>
CC:     Conor Dooley <conor@...nel.org>, <pmladek@...e.com>,
        <senozhatsky@...omium.org>, <rostedt@...dmis.org>,
        <john.ogness@...utronix.de>, <linux-kernel@...r.kernel.org>,
        <regressions@...ts.linux.dev>
Subject: Re: [resend][bug] low-probability console lockups since 5.19

On Thu, Sep 29, 2022 at 11:06:01AM +0200, Thorsten Leemhuis wrote:
> Hi Conor
> 
> On 28.09.22 18:55, Conor Dooley wrote:
> > On Fri, Sep 23, 2022 at 05:24:17PM +0100, Conor Dooley wrote:
> >>
> >> Been bisecting a bug that is causing a boot failure in my CI & have
> >> ended up here.. The bug in question is a low(ish) probability lock up
> >> of the serial console, I would estimate about 1-in-5 chance on the
> >> boards I could actually trigger it on which it has taken me so long
> >> to realise that this was an actual problem. Thinking back on it, there
> >> were other failures that I would retroactively attribute to this
> >> problem too, but I had earlycon disabled
> > [...]
> > #regzbot introduced: 5831788afb17b89c5b531fb60cbd798613ccbb63 ^
> > Hopefully I did this correctly...
> 
> Yes, you did, thx for this. I already had been watching this thread
> manually and was a bit unsure what to do with it.

Great, thanks.

> 
> > I picked that commit as that's where things start going haywire.
> 
> There is one thing I wonder when skimming this thread: was there maybe
> some other change somewhere in the kernel between the introduction and
> the revert of the printk console kthreads patches that is the real
> culprit here that makes existing, older races easier to hit? But I guess
> in the end that would be very hard to find and it's easier to fix the
> problem in the console driver... :-/

Entirely possible that something arrived in the middle, yeah. I've done
100s of reboots on that interim section, albeit with the threaded
printers enabled, as I restarted the bisection several times & never hit
this failure then.

I don't know anything about console/printk/serial drivers unfortunately
so I will almost certainly not be able to find the problem by
inspection. I'd rather submit patches than send reports, but I really
really need some help here. I looked at the two patterns Petr suggested,
but the former I am not sure applies since the issue is present even
when earlycon is disabled & the latter appears (to my untrained eye) to
be accounted for in the 8250 driver.

Thanks,
Conor.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ