lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Jun 2024 13:45:17 -0700
From: Doug Anderson <dianders@...omium.org>
To: Johan Hovold <johan+linaro@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby <jirislaby@...nel.org>, 
	Konrad Dybcio <konrad.dybcio@...aro.org>, Bjorn Andersson <andersson@...nel.org>, 
	linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] serial: qcom-geni: fix hard lockup on buffer flush

Hi,

On Mon, Jun 24, 2024 at 10:39 AM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:
> >
> > The Qualcomm GENI serial driver does not handle buffer flushing and used
> > to print garbage characters when the circular buffer was cleared. Since
> > commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this
> > instead results in a lockup due to qcom_geni_serial_send_chunk_fifo()
> > spinning indefinitely in the interrupt handler.
> >
> > This is easily triggered by interrupting a command such as dmesg in a
> > serial console but can also happen when stopping a serial getty on
> > reboot.
> >
> > Fix the immediate issue by printing NUL characters until the current TX
> > command has been completed.
> >
> > Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
> > Reported-by: Douglas Anderson <dianders@...omium.org>
> > Signed-off-by: Johan Hovold <johan+linaro@...nel.org>
> > ---
> >  drivers/tty/serial/qcom_geni_serial.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
>
> I don't love this, though it's better than a hard lockup. I will note
> that it doesn't exactly restore the old behavior which would have
> (most likely) continued to output data that had previously been in the
> FIFO but that had been cancelled.
>
> ...actually, if we're looking for a short term fix that mimics the old
> behavior more closely, what would you think about having a
> driver-local buffer that we fill when we kick off the transfer. Then
> the data can't go away from underneath us. It's an extra copy, but
> it's just a memory-to-memory copy which is much faster than the MMIO
> copy we'll eventually need to do anyway... This local buffer would
> essentially act as a larger FIFO.
>
> You could choose the local buffer size to balance being able to cancel
> quickly vs. using the FIFO efficiently.

Also: if we're looking at quick/easy to land and just fix the hard
lockup, I'd vote for this (I can send a real patch, though I'm about
to go on vacation):

--

@@ -904,8 +904,8 @@ static void qcom_geni_serial_handle_tx_fifo(struct
uart_port *uport,
                goto out_write_wakeup;

        if (!port->tx_remaining) {
-               qcom_geni_serial_setup_tx(uport, pending);
-               port->tx_remaining = pending;
+               port->tx_remaining = min(avail, pending);
+               qcom_geni_serial_setup_tx(uport, port->tx_remaining);

                irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
                if (!(irq_en & M_TX_FIFO_WATERMARK_EN))

--

That will fix the hard lockup, is short and sweet, and also doesn't
end up outputting NUL bytes.

I measured time with that. I've been testing with a file I created
called "alphabet.txt" that just contains the letters A-Z repeated 3
times followed by a "\n", over and over again. I think gmail will kill
me with word wrapping, but basically:

ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ
...
...

FWIW:

head -200 /var/alphabet.txt  | wc
    200     200   15800

Before my patch I ran `time head -200 /var/alphabet.txt` and I got:

real    0m1.386s

After my patch I ran the same thing and got:

real    0m1.409s

So it's slower, but that's not 25% slower. I get 1.7% slower:

In [6]: (1.409 - 1.386) / 1.386 * 100
Out[6]: 1.659451659451669

IMO that seems like a fine slowdown in order to avoid printing NUL bytes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ