[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZoZyZcVyLvI9t4fH@hovoldconsulting.com>
Date: Thu, 4 Jul 2024 11:59:01 +0200
From: Johan Hovold <johan@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Johan Hovold <johan+linaro@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
Konrad Dybcio <konrad.dybcio@...aro.org>,
Bjorn Andersson <andersson@...nel.org>,
linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] serial: qcom-geni: fix hard lockup on buffer flush
On Tue, Jun 25, 2024 at 04:40:36PM +0200, Johan Hovold wrote:
> On Mon, Jun 24, 2024 at 10:39:07AM -0700, Doug Anderson wrote:
> > On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:
> > >
> > > The Qualcomm GENI serial driver does not handle buffer flushing and used
> > > to print garbage characters when the circular buffer was cleared. Since
> > > commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this
> > > instead results in a lockup due to qcom_geni_serial_send_chunk_fifo()
> > > spinning indefinitely in the interrupt handler.
> > >
> > > This is easily triggered by interrupting a command such as dmesg in a
> > > serial console but can also happen when stopping a serial getty on
> > > reboot.
> > >
> > > Fix the immediate issue by printing NUL characters until the current TX
> > > command has been completed.
> > I don't love this, though it's better than a hard lockup. I will note
> > that it doesn't exactly restore the old behavior which would have
> > (most likely) continued to output data that had previously been in the
> > FIFO but that had been cancelled.
>
> Ah, yes, you're right. I went back and compared with 6.9 and the effect
> was indeed (often) that the machine felt sluggish when you hit ctrl-c to
> interrupt something like dmesg and the driver would continue to print up
> to 4k characters after that (e.g. 350 ms at 115200).
>
> The idea here was to fix the lockup regression separately and then have
> the third patch address the buffer flush failure, which could also be
> backported without depending on the kfifo conversion.
>
> But running with this series since yesterday, I realise there are still
> some unresolved interaction with the console code, which can now trigger
> a soft (instead of hard) lockup on reboot...
I've reworked my series to avoid the remaining lockup, which was due to
v1 not handling some cases where cancelling a command left stale data in
the fifo.
I've also reordered the patches to avoid printing NUL characters as an
intermediate fix.
Johan
Powered by blists - more mailing lists