lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=VHHCsjJmVWDXN4g3U=-_SLWc2iWqbAdZPOykn+QMQojw@mail.gmail.com>
Date:   Thu, 7 Dec 2023 08:49:16 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Kalle Valo <kvalo@...nel.org>
Cc:     Yongqin Liu <yongqin.liu@...aro.org>, ath10k@...ts.infradead.org,
        Abhishek Kumar <kuabhs@...omium.org>,
        Youghandhar Chintala <quic_youghand@...cinc.com>,
        linux-kernel@...r.kernel.org, linux-wireless@...r.kernel.org,
        Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        John Stultz <jstultz@...gle.com>,
        Viktor Martensson <vmartensson@...gle.com>,
        Amit Pundir <amit.pundir@...aro.org>
Subject: Re: [PATCH] ath10k: Don't touch the CE interrupt registers after
 power up

Hi,

On Thu, Dec 7, 2023 at 6:49 AM Kalle Valo <kvalo@...nel.org> wrote:
>
> > Recently during our Android build test on the Dragonboard 845c board,
> > with the Android Common Kernel android11-5.4-lts and android12-5.4-lts branches,
> >
> > we found there are some ufshcd related changes printed,
> > and the serial console gets stuck, no response for input,
> > and the Android boot is stuck at the animation window.
> >
> > The problem is reported here
> >     https://issuetracker.google.com/issues/314366682
> > You could check there for more log details.
> >
> > And with some bisection, I found it's related to this commit,
> > when I revert this commit, the problem is gone.
> >
> > So replied here, not sure if you have any idea about it,
> > or any suggestions on what we should do next to resolve the problem?
>
> FWIW we don't support Android kernels, only kernel.org releases.

Right. If the problem also reproduces on mainline Linux then that
would be interesting to know. I think db845c is at least somewhat well
supported by mainline so it should be possible to test it there.

If I had to guess, I'd think that probably the CE interrupts are
firing nonstop for you and not getting handled. Then those constant
interrupts are (presumably) causing the UFS controller to timeout. If
this is true, the question is: why? Maybe you could use ftrace to
confirm this by adding some traces to
ath10k_snoc_per_engine_handler()? There's a way to get ftrace buffers
dumped on panic (or, if you use kdb, it has a command for it).

If this reproduces on mainline and it's not obvious how to fix this, I
don't object to a revert. As per the description of the original
patch, the problem being fixed was fairly rare and I didn't have a way
to reproduce it. The fix seemed safe to me and we've been using it on
Chromebooks based on sc7180, but if it had to get reverted it wouldn't
be the end of the world.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ