linux-kernel - Re: [PATCH v4 8/8] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=XmuKUKvCq7gG+wM-jAAgHLHnYw4NteFEKz5Fmczd=U7g@mail.gmail.com>
Date: Mon, 24 Jun 2024 14:15:07 -0700
From: Doug Anderson <dianders@...omium.org>
To: Johan Hovold <johan@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby <jirislaby@...nel.org>, 
	Yicong Yang <yangyicong@...ilicon.com>, Tony Lindgren <tony@...mide.com>, 
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, Johan Hovold <johan+linaro@...nel.org>, 
	John Ogness <john.ogness@...utronix.de>, linux-arm-msm@...r.kernel.org, 
	Bjorn Andersson <andersson@...nel.org>, Konrad Dybcio <konrad.dybcio@...aro.org>, 
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, 
	Stephen Boyd <swboyd@...omium.org>, linux-serial@...r.kernel.org, 
	linux-kernel@...r.kernel.org, 
	Uwe Kleine-König <u.kleine-koenig@...gutronix.de>, 
	Rob Herring <robh@...nel.org>
Subject: Re: [PATCH v4 8/8] serial: qcom-geni: Rework TX in FIFO mode to fix hangs/lockups

Hi,

On Mon, Jun 24, 2024 at 5:43 AM Johan Hovold <johan@...nel.org> wrote:
>
> On Mon, Jun 10, 2024 at 03:24:26PM -0700, Douglas Anderson wrote:
> > The fact that the Qualcomm GENI hardware interface is based around
> > "packets" is really awkward to fit into Linux's UART design.
> > Specifically, in order to send bytes you need to start up a new
> > "command" saying how many bytes you want to send and then you need to
> > send all those bytes. Once you've committed to sending that number of
> > bytes it's very awkward to change your mind and send fewer, especially
> > if you want to do so without dropping bytes on the ground.
> >
> > There may be a few cases where you might want to send fewer bytes than
> > you originally expected:
> > 1. You might want to interrupt the transfer with something higher
> >    priority, like the kernel console or kdb.
> > 2. You might want to enter system suspend.
> > 3. The user might have killed the program that had queued bytes for
> >    sending over the UART.
> >
> > Despite this awkwardness the Linux driver has still tried to send
> > bytes using large transfers. Whenever the driver started a new
> > transfer it would look at the number of bytes in the OS's queue and
> > start a transfer for that many. The idea of using larger transfers is
> > that it should be more efficient. When you're in the middle of a large
> > transfer you can get interrupted when the hardware FIFO is close to
> > empty and add more bytes in. Whenever you get to the end of a transfer
> > you have to wait until the transfer is totally done before you can add
> > more bytes and, depending on interrupt latency, that can cause the
> > UART to idle a bit.
>
> As I mentioned last week, the slowdown from this is quite noticeable
> (e.g. 25% slowdown at @115200), but this may be the price we need to pay
> for correctness, at least temporarily.
>
> An alternative might be to switch to using a 16 byte fifo. This should
> reduce console latency even further, and may be able avoid the idling
> UART penalty by continuing to use the watermark interrupt for refilling
> the FIFO.

I'm a bit confused. Right now we're using (effectively) a 64-byte
FIFO. The FIFO is 16-words deep and we have 4 bytes per word. ...so
I'm not sure what you mean by switching to a 16-byte FIFO. Do you mean
to make less use of the FIFO, or something else?

Overall the big problem I found in all my testing was that I needed to
wait for a "command done" before kicking off a new command. When the
"command done" arrives then the UART has stopped transmitting and
you've got to suffer an interrupt latency before you can start
transferring again. Essentially:

1. Pick a transfer size.
2. You can keep sending bytes / using the FIFO efficiently as long as
there are still bytes left in the transfer.
3. When you get to the end of the transfer, you have to wait for the
UART to stop, report that it's done, and then suffer an interrupt
latency to start a new transfer.

So to be efficient you want to pick a big transfer size but if there's
any chance that you might not need to transfer that many bytes then
you need to figure out what to do. If you can handle that properly
then that's great. If not then we have to make sure we never kick off
a transfer that we might not finish.

I'd also mention that, as talked about in my response to your other
patch [1], I'm not seeing a 25% slowdown. I tested both with my simple
proposal and with this whole series applied and my slowdown is less
than 2%. I guess there must be something different with your setup?
Trying to think about what kind of slowdown would be reasonable for my
patch series at 115200:

a) We send 64 bytes efficiently, which takes 5.6ms (64 * 1000 / 11520)

b) We stop transferring and wait for an interrupt.

c) We start transferring 64 bytes again.

Let's say that your interrupt latency is 1 ms, which would be really
terrible. In that case you'll essentially transfer 64 bytes in 6.6ms
instead of 5.6 ms, right? That would be an 18% hit. Let's imagine
something more sensible and say that most of the time you can handle
an interrupt in 100 ms. That would be about a 1.7% slowdown, which
actually matches what I was seeing. For reference, even an old arm32
rk3288-veyron device I worked with years ago could usually handle
interrupts in ~100-200 ms since dwc2 needs you to handle at least one
(sometimes more) interrupt per USB uFrame (250ms).

...so I'm confused about where your 25% number is coming from...


[1] https://lore.kernel.org/r/CAD=FV=UwyzA614tDoq7BntW1DWmic=DOszr+iRJVafVEYrXhpw@mail.gmail.com