lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Jun 2023 09:44:50 -0400
From:   Brian Masney <bmasney@...hat.com>
To:     Lucas Karpinski <lkarpins@...hat.com>
Cc:     linux-kernel@...r.kernel.org, agross@...nel.org,
        andersson@...nel.org, konrad.dybcio@...aro.org, robh+dt@...nel.org,
        krzysztof.kozlowski+dt@...aro.org, linux-arm-msm@...r.kernel.org,
        devicetree@...r.kernel.org, ahalaney@...hat.com,
        echanude@...hat.com, quic_shazhuss@...cinc.com
Subject: Re: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a
 node"

Hi Lucas,

On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote:
> This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.

I am all for reverting this commit however I think your commit message
needs cleaned up.

> The patch introduced a sporadic error where the Qdrive3 will fail to
> boot occasionally due to an rcu preempt stall.
> Qualcomm has disabled pcie2a downstream:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f

Personally I'd remove the mention of the downstream kernel is this case.

Also your paragraphs are formatted weird with a newline at the end
of every sentence. Get them to flow together as a regular paragraph.
This is the relevant line that I have in my muttrc file to help.

set editor="vim -c 'set spell spelllang=en' -c 'set tw=72' -c 'set wrap'"

> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
> rcu:     (t=5253 jiffies g=-175 q=2552 ncpus=8)
> Call trace:
>  __do_softirq
>  ____do_softirq
>  call_on_irq_stack
>  do_softirq_own_stack
>  __irq_exit_rcu
>  irq_exit_rcu
> 
> The issue occurs normally once every 3-4 boot cycles.
> There is likely a race condition caused when setting up the two pcie
> domains concurrently (pcie2a and pcie3a).

I would also add that Qualcomm told us that upgrading the firmware on
the PCIe switch would correct this issue. We've upgraded the PCIe switch
to the latest firmware and this issue is still present. Apparently we
need to use a specific older version of the firmware that we can't get
from the PCIe switch vendor or Qualcomm.

Nothing is hooked up to pcie2a on the QDrive3 so there's no loss in
functionality by disabling this. We always have to remember to revert
this commit when working with an upstream kernel.

> This is not a solution, so this patch is disabling pcie2a as it seems
> Red Hat are the only ones working on the board,
> we're find with disabling the node until a root cause is found. If
> anyone has further suggestions for debugging, let me know.

This should go under the ---.

Brian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ