lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Aug 2020 13:59:48 +0200
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Guenter Roeck <linux@...ck-us.net>
Cc:     linux-serial@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Recursive/circular locking in
 serial8250_console_write/serial8250_do_startup

On Wed, Aug 12, 2020 at 10:06:29PM -0700, Guenter Roeck wrote:
> On Wed, Aug 12, 2020 at 08:48:13AM -0700, Guenter Roeck wrote:
> > Hi,
> > 
> > crbug.com/1114800 reports a hard lockup due to circular locking in the
> > 8250 console driver. This is seen if CONFIG_PROVE_LOCKING is enabled.
> > 
> > Problem is as follows:
> > - serial8250_do_startup() locks the serial (console) port.
> > - serial8250_do_startup() then disables interrupts if interrupts are
> >   shared, by calling disable_irq_nosync().
> > - disable_irq_nosync() calls __irq_get_desc_lock() to lock the interrupt
> >   descriptor.
> > - __irq_get_desc_lock() calls lock_acquire()
> > - If CONFIG_PROVE_LOCKING is enabled, validate_chain() and check_noncircular()
> >   are called and identify a potential locking error.
> > - This locking error is reported via printk, which ultimately calls
> >   serial8250_console_write().
> > - serial8250_console_write() tries to lock the serial console port.
> >   Since it is already locked, the system hangs and ultimately reports
> >   a hard lockup.
> > 
> > I understand we'll need to figure out and fix what lockdep complains about,
> > and I am working on that. However, even if that is fixed, we'll need a
> > solution for the recursive lock: Fixing the lockdep problem doesn't
> > guarantee that a similar problem (or some other log message) won't be
> > detected and reported sometime in the future while serial8250_do_startup()
> > holds the console port lock.
> > 
> > Ideas, anyone ? Everything I came up with so far seems clumsy and hackish.
> > 
> 
> Turns out the situation is a bit worse than I thought. disable_irq_nosync(),
> when called from serial8250_do_startup(), locks the interrupt descriptor.
> The order of locking is
> 	serial port lock
> 	  interrupt descriptor lock
> 
> At the same time, __setup_irq() locks the interrupt descriptor as well.
> With the descriptor locked, it may report an error message using pr_err().
> This in turn may call serial8250_console_write(), which will try to lock
> the console serial port. The lock sequence is
> 	interrupt descriptor lock
> 	  serial port lock
> 
> I added the lockdep splat to the bug log at crbug.com/1114800.
> 
> Effectively, I think, this means we can't call disable_irq_nosync()
> while holding a serial port lock, or at least not while holding a
> serial port lock that is associated with a console.
> 
> The problem was introduced (or, rather, exposed) with upstream commit
> 7febbcbc48fc ("serial: 8250: Check UPF_IRQ_SHARED in advance").

Adding Andy, who wrote the above commit :)

Andy, any thoughts?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ