lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZoZyZcVyLvI9t4fH@hovoldconsulting.com>
Date: Thu, 4 Jul 2024 11:59:01 +0200
From: Johan Hovold <johan@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Johan Hovold <johan+linaro@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Konrad Dybcio <konrad.dybcio@...aro.org>,
	Bjorn Andersson <andersson@...nel.org>,
	linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] serial: qcom-geni: fix hard lockup on buffer flush

On Tue, Jun 25, 2024 at 04:40:36PM +0200, Johan Hovold wrote:
> On Mon, Jun 24, 2024 at 10:39:07AM -0700, Doug Anderson wrote:
> > On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:
> > >
> > > The Qualcomm GENI serial driver does not handle buffer flushing and used
> > > to print garbage characters when the circular buffer was cleared. Since
> > > commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this
> > > instead results in a lockup due to qcom_geni_serial_send_chunk_fifo()
> > > spinning indefinitely in the interrupt handler.
> > >
> > > This is easily triggered by interrupting a command such as dmesg in a
> > > serial console but can also happen when stopping a serial getty on
> > > reboot.
> > >
> > > Fix the immediate issue by printing NUL characters until the current TX
> > > command has been completed.

> > I don't love this, though it's better than a hard lockup. I will note
> > that it doesn't exactly restore the old behavior which would have
> > (most likely) continued to output data that had previously been in the
> > FIFO but that had been cancelled.
> 
> Ah, yes, you're right. I went back and compared with 6.9 and the effect
> was indeed (often) that the machine felt sluggish when you hit ctrl-c to
> interrupt something like dmesg and the driver would continue to print up
> to 4k characters after that (e.g. 350 ms at 115200).
> 
> The idea here was to fix the lockup regression separately and then have
> the third patch address the buffer flush failure, which could also be
> backported without depending on the kfifo conversion.
> 
> But running with this series since yesterday, I realise there are still
> some unresolved interaction with the console code, which can now trigger
> a soft (instead of hard) lockup on reboot...

I've reworked my series to avoid the remaining lockup, which was due to
v1 not handling some cases where cancelling a command left stale data in
the fifo.

I've also reordered the patches to avoid printing NUL characters as an
intermediate fix.

Johan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ