lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnvJQDX6NkyRCA8y@hovoldconsulting.com>
Date: Wed, 26 Jun 2024 09:54:40 +0200
From: Johan Hovold <johan@...nel.org>
To: Doug Anderson <dianders@...omium.org>
Cc: Johan Hovold <johan+linaro@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jiri Slaby <jirislaby@...nel.org>,
	Konrad Dybcio <konrad.dybcio@...aro.org>,
	Bjorn Andersson <andersson@...nel.org>,
	linux-arm-msm@...r.kernel.org, linux-serial@...r.kernel.org,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 2/3] serial: qcom-geni: fix soft lockup on sw flow
 control and suspend

On Mon, Jun 24, 2024 at 02:58:39PM -0700, Doug Anderson wrote:
> On Mon, Jun 24, 2024 at 2:23 PM Doug Anderson <dianders@...omium.org> wrote:
> > On Mon, Jun 24, 2024 at 6:31 AM Johan Hovold <johan+linaro@...nel.org> wrote:

> > > +static void qcom_geni_serial_clear_tx_fifo(struct uart_port *uport)
> > > +{
> > > +       struct qcom_geni_serial_port *port = to_dev_port(uport);
> > > +
> > >         if (!qcom_geni_serial_main_active(uport))
> > >                 return;
> > >
> > > +       /*
> > > +        * Increase watermark level so that TX can be restarted and wait for
> > > +        * sequencer to start to prevent lockups.
> > > +        */
> > > +       writel(port->tx_fifo_depth, uport->membase + SE_GENI_TX_WATERMARK_REG);
> > > +       qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS,
> > > +                                       M_TX_FIFO_WATERMARK_EN, true);
> >
> > Oh, maybe this "wait for sequencer to start to prevent lockups." is
> > the part that I was missing? Can you explain more about what's going
> > on here? Why does waiting for the watermark interrupt to fire prevent
> > lockups? I would have imagined that the watermark interrupt would be
> > part of the geni hardware and have nothing to do with the firmware
> > running on the other end, so I'm not sure why it firing somehow would
> > prevent a lockup. Was this just by trial and error?
> 
> Actually, the more I look at it the more confused I am about your
> qcom_geni_serial_clear_tx_fifo(). Can you explain and maybe add some
> inline comments in the function since it's not obvious? Specifically,
> things I'm confused about with your patch:
> 
> 1. The function is named qcom_geni_serial_clear_tx_fifo() which
> implies that when it finishes that the hardware FIFO will have nothing
> in it. ...but how does your code ensure this?

Yeah, I realised after I sent out the series that this may not be the
case. I was under the impression that cancelling a command would discard
the data in the FIFO (e.g. when starting the next command) but that was
probably an error in my mental model.

Do you see any way to discard the FIFO in the docs you have access to?
 
> 2. If the function is really clearing the FIFOs then why do we need to
> adjust the watermark level? The fact that you need to adjust the
> watermark levels implies (to me) that there are things stuck in the
> FIFO still. ...but then what happens to those characters? When are
> they sent?

Exactly, there is data there according to the FIFO status, but I
erroneously interpreted it as a it would be discarded (e.g. when
starting the next command).

> 3. On my hardware you're setting the FIFO level to 16 here. The docs I
> have say that if the FIFO level is "less than" the value you set here
> then the interrupt will go off and further clarifies that if you set
> the register to 1 here then you'll get interrupted when the FIFO is
> empty. So what happens with your solution if the FIFO is completely
> full? In that case you'd have to set this to 17, right? ...but then I
> could believe that might confuse the interrupt handler which would get
> told to start transmitting when there is no room for anything.

Indeed. I may implicitly be relying on the absence of hardware flow
control as well so that waiting for one character to be sent is what
makes this work.

> Maybe something is missing in my mental model here and testing your
> patch and hitting Ctrl-C seems to work, but I don't really understand
> why so hopefully you can clarify! :-)

I spent too much time trying to reverse engineer this hw over the
weekend and wanted to get this out as a counter proposal as it indicated
that we may be able to find a smaller fix. The series addresses the
serial getty issues, but as I mentioned yesterday there is some
interaction with the console left to be resolved before this can be an
alternative.

If it wasn't for the hard lockup I would have sent the whole thing as
an RFC.

Johan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ