lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YrCDNqsPrY+Hs9ju@alley>
Date:   Mon, 20 Jun 2022 16:24:54 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Marek BehĂșn <kabel@...nel.org>,
        John Ogness <john.ogness@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Jan Kara <jack@...e.cz>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: Boot stall regression from "printk for 5.19" merge

On Mon 2022-06-20 08:48:29, Linus Torvalds wrote:
> On Mon, Jun 20, 2022 at 6:44 AM Petr Mladek <pmladek@...e.com> wrote:
> >
> > Both early console and proper console driver has its own kthread.
> >
> > >    1.166486] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 22, base_baud = 12500000) is a 16550A
> >
> > The line is malformed. I wonder if both early console and proper
> > console used the same port in parallel.
> 
> Honestly, I get the feeling that we need to just revert the whole
> "console from thread" thing.
> 
> Because:
> 
> > So, it looks like that con->write() code is not correctly serialized
> > between the early and normal console.
> > [ ... ]
> > I am going to check the driver...
> 
> We really cannot be in the situation that some random driver that used
> to work no longer does, and causes oopses and/or memory corruption
> just because it's now entered differently from how it traditionally
> has been.
>
> The traditional console write code has always been very careful to get
> exclusive access, and it sounds like that is just plain broken now.
> 
> So I don't think this is a "driver is broken".

I see what you think. There might be so many problems with the drivers
because they were never used this way. It looks like we opened a can
of worms. It is even more problematic because it causes silent boot
crashes and it is hard to debug.

I kind of agree with this and I have started looking at some more
generic solution.

All these boot crashes were in exactly the same situation when the
proper console was initialized and registered while there was
the early console used at the same time. It is a problem because
they use the same port.

The parallel use of different consoles should be much more
safe because they are much more independent.

There are the following possibilities:

1. Enable the kthreads later after the early consoles are gone.
   This is easy and should fix all known boot problems.

2. Temporary stop the kthreads and use direct printing when
   the proper consoles are registered. Well, this might be
   more complicated because the port might be accessed
   also before register_console() is called.

3. Another solution would be to use the global conosle_lock()
   also to synchronize the kthreads against each other. But
   it would be unfortunate.

I am going to prepare 1st solution.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ