[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnrW5EcGKGYzS8qf@hovoldconsulting.com>
Date: Tue, 25 Jun 2024 16:40:36 +0200
From: Johan Hovold <johan@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Johan Hovold <johan+linaro@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
Konrad Dybcio <konrad.dybcio@...aro.org>,
Bjorn Andersson <andersson@...nel.org>,
linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] serial: qcom-geni: fix hard lockup on buffer flush
On Mon, Jun 24, 2024 at 10:39:07AM -0700, Doug Anderson wrote:
> On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:
> >
> > The Qualcomm GENI serial driver does not handle buffer flushing and used
> > to print garbage characters when the circular buffer was cleared. Since
> > commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this
> > instead results in a lockup due to qcom_geni_serial_send_chunk_fifo()
> > spinning indefinitely in the interrupt handler.
> >
> > This is easily triggered by interrupting a command such as dmesg in a
> > serial console but can also happen when stopping a serial getty on
> > reboot.
> >
> > Fix the immediate issue by printing NUL characters until the current TX
> > command has been completed.
> >
> > Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > Reported-by: Douglas Anderson <dianders@...omium.org>
> > Signed-off-by: Johan Hovold <johan+linaro@...nel.org>
> > ---
> > drivers/tty/serial/qcom_geni_serial.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> I don't love this, though it's better than a hard lockup. I will note
> that it doesn't exactly restore the old behavior which would have
> (most likely) continued to output data that had previously been in the
> FIFO but that had been cancelled.
Ah, yes, you're right. I went back and compared with 6.9 and the effect
was indeed (often) that the machine felt sluggish when you hit ctrl-c to
interrupt something like dmesg and the driver would continue to print up
to 4k characters after that (e.g. 350 ms at 115200).
The idea here was to fix the lockup regression separately and then have
the third patch address the buffer flush failure, which could also be
backported without depending on the kfifo conversion.
But running with this series since yesterday, I realise there are still
some unresolved interaction with the console code, which can now trigger
a soft (instead of hard) lockup on reboot...
> ...actually, if we're looking for a short term fix that mimics the old
> behavior more closely, what would you think about having a
> driver-local buffer that we fill when we kick off the transfer. Then
> the data can't go away from underneath us. It's an extra copy, but
> it's just a memory-to-memory copy which is much faster than the MMIO
> copy we'll eventually need to do anyway... This local buffer would
> essentially act as a larger FIFO.
The idea did cross my mind, three levels of fifo...
> You could choose the local buffer size to balance being able to cancel
> quickly vs. using the FIFO efficiently.
Yeah, perhaps adding a smaller driver kfifo would work, but not sure how
clean it would be to implement.
Johan
Powered by blists - more mailing lists