[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=UwyzA614tDoq7BntW1DWmic=DOszr+iRJVafVEYrXhpw@mail.gmail.com>
Date: Mon, 24 Jun 2024 13:45:17 -0700
From: Doug Anderson <dianders@...omium.org>
To: Johan Hovold <johan+linaro@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby <jirislaby@...nel.org>,
Konrad Dybcio <konrad.dybcio@...aro.org>, Bjorn Andersson <andersson@...nel.org>,
linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] serial: qcom-geni: fix hard lockup on buffer flush
Hi,
On Mon, Jun 24, 2024 at 10:39 AM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:
> >
> > The Qualcomm GENI serial driver does not handle buffer flushing and used
> > to print garbage characters when the circular buffer was cleared. Since
> > commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this
> > instead results in a lockup due to qcom_geni_serial_send_chunk_fifo()
> > spinning indefinitely in the interrupt handler.
> >
> > This is easily triggered by interrupting a command such as dmesg in a
> > serial console but can also happen when stopping a serial getty on
> > reboot.
> >
> > Fix the immediate issue by printing NUL characters until the current TX
> > command has been completed.
> >
> > Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > Reported-by: Douglas Anderson <dianders@...omium.org>
> > Signed-off-by: Johan Hovold <johan+linaro@...nel.org>
> > ---
> > drivers/tty/serial/qcom_geni_serial.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> I don't love this, though it's better than a hard lockup. I will note
> that it doesn't exactly restore the old behavior which would have
> (most likely) continued to output data that had previously been in the
> FIFO but that had been cancelled.
>
> ...actually, if we're looking for a short term fix that mimics the old
> behavior more closely, what would you think about having a
> driver-local buffer that we fill when we kick off the transfer. Then
> the data can't go away from underneath us. It's an extra copy, but
> it's just a memory-to-memory copy which is much faster than the MMIO
> copy we'll eventually need to do anyway... This local buffer would
> essentially act as a larger FIFO.
>
> You could choose the local buffer size to balance being able to cancel
> quickly vs. using the FIFO efficiently.
Also: if we're looking at quick/easy to land and just fix the hard
lockup, I'd vote for this (I can send a real patch, though I'm about
to go on vacation):
--
@@ -904,8 +904,8 @@ static void qcom_geni_serial_handle_tx_fifo(struct
uart_port *uport,
goto out_write_wakeup;
if (!port->tx_remaining) {
- qcom_geni_serial_setup_tx(uport, pending);
- port->tx_remaining = pending;
+ port->tx_remaining = min(avail, pending);
+ qcom_geni_serial_setup_tx(uport, port->tx_remaining);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
if (!(irq_en & M_TX_FIFO_WATERMARK_EN))
--
That will fix the hard lockup, is short and sweet, and also doesn't
end up outputting NUL bytes.
I measured time with that. I've been testing with a file I created
called "alphabet.txt" that just contains the letters A-Z repeated 3
times followed by a "\n", over and over again. I think gmail will kill
me with word wrapping, but basically:
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
...
...
FWIW:
head -200 /var/alphabet.txt | wc
200 200 15800
Before my patch I ran `time head -200 /var/alphabet.txt` and I got:
real 0m1.386s
After my patch I ran the same thing and got:
real 0m1.409s
So it's slower, but that's not 25% slower. I get 1.7% slower:
In [6]: (1.409 - 1.386) / 1.386 * 100
Out[6]: 1.659451659451669
IMO that seems like a fine slowdown in order to avoid printing NUL bytes.
Powered by blists - more mailing lists