linux-kernel - Re: [PATCH v3] tty: serial: msm_serial: avoid system lockup condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAF6AEGt9=xND3TyJy5HDg3RbXB=dxzj6oqBxKX9MaC5O2=JYVQ@mail.gmail.com>
Date:   Mon, 10 Jun 2019 12:39:11 -0700
From:   Rob Clark <robdclark@...il.com>
To:     Jorge Ramirez <jorge.ramirez-ortiz@...aro.org>
Cc:     Greg KH <gregkh@...uxfoundation.org>, agross@...nel.org,
        David Brown <david.brown@...aro.org>, jslaby@...e.com,
        linux-arm-msm <linux-arm-msm@...r.kernel.org>,
        linux-serial@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        khasim.mohammed@...aro.org,
        Bjorn Andersson <bjorn.andersson@...aro.org>
Subject: Re: [PATCH v3] tty: serial: msm_serial: avoid system lockup condition

On Mon, Jun 10, 2019 at 12:11 PM Jorge Ramirez
<jorge.ramirez-ortiz@...aro.org> wrote:
>
> On 6/10/19 19:53, Rob Clark wrote:
> > On Mon, Jun 10, 2019 at 10:23 AM Jorge Ramirez-Ortiz
> > <jorge.ramirez-ortiz@...aro.org> wrote:
> >> The function msm_wait_for_xmitr can be taken with interrupts
> >> disabled. In order to avoid a potential system lockup - demonstrated
> >> under stress testing conditions on SoC QCS404/5 - make sure we wait
> >> for a bounded amount of time.
> >>
> >> Tested on SoC QCS404.
> >>
> >> Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@...aro.org>
> >
> > I had observed that heavy UART traffic would lockup the system (on
> > sdm845, but I guess same serial driver)?
> >
> > But a comment from the peanut gallary:  wouldn't this fix lead to TX
> > corruption, ie. writing more into TX fifo before hw is ready?  I
> > haven't looked closely at the driver, but a way to wait without irqs
> > disabled would seem nicer..
> >
> > BR,
> > -R
> >
>
> I think sdm845 uses a different driver (qcom_geni_serial.c) but yes in
> any case we need to determine the sequence leading to the lockup. In our
> internal releases we are adding additional debug information to try to
> capture this info.

ahh, ok.. perhaps qcom_geni_serial has a similar issue.. fwiw where I
tend to hit it is debugging mesa, bugs that can trigger GPU lockups
can tricker a lot of them, and a lot of dmesg spew.  Which in turn
seems to freeze usb (? I think.. I'm using a usb-c ethernet adapter)
making it hard to ctrl-c the thing that  is causing the GPU lockups in
the first place.

> But also I dont think this means that the safety net should not be used

yeah, probably not worse than the current state.. although a proper
solution would be nice

> btw, do you think that perhaps we should add a WARN_ONCE() on timeout?.

not sure if backtrace adds much value here.. but perhaps a (very)
ratelimited warning msg?  You don't want to make the underlying
problem too much worse with too much debug msg but some hint about
what is happening could be useful.

BR,
-R