lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6m7xpqs73wrlin2ghhviwc4ijb5kyvk7ba2wpflqkgjivv6ol2@z5i5uli3h7f3>
Date:   Mon, 12 Jun 2023 16:05:04 -0400
From:   Eric Chanudet <echanude@...hat.com>
To:     Lucas Karpinski <lkarpins@...hat.com>
Cc:     linux-kernel@...r.kernel.org, agross@...nel.org,
        andersson@...nel.org, konrad.dybcio@...aro.org, robh+dt@...nel.org,
        krzysztof.kozlowski+dt@...aro.org, linux-arm-msm@...r.kernel.org,
        devicetree@...r.kernel.org, ahalaney@...hat.com,
        bmasney@...hat.com, quic_shazhuss@...cinc.com
Subject: Re: [PATCH] Revert "arm64: dts: qcom: sa8540p-ride: enable pcie2a
 node"

On Fri, Jun 02, 2023 at 03:33:21PM -0400, Lucas Karpinski wrote:
> This reverts commit 2eb4cdcd5aba2db83f2111de1242721eeb659f71.
> 
> The patch introduced a sporadic error where the Qdrive3 will fail to
> boot occasionally due to an rcu preempt stall.
> Qualcomm has disabled pcie2a downstream:
> https://git.codelinaro.org/clo/la/platform/vendor/qcom-opensource/rh-patch/-/commit/447f2135909683d1385af36f95fae5e1d63a7e2f
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     0-....: (1 GPs behind) idle=77fc/1/0x4000000000000004 softirq=841/841 fqs=2476
> rcu:     (t=5253 jiffies g=-175 q=2552 ncpus=8)
> Call trace:
>  __do_softirq
>  ____do_softirq
>  call_on_irq_stack
>  do_softirq_own_stack
>  __irq_exit_rcu
>  irq_exit_rcu
> 
> The issue occurs normally once every 3-4 boot cycles.
> There is likely a race condition caused when setting up the two pcie
> domains concurrently (pcie2a and pcie3a).
> 
> The issue is not present when only pcie2a is enabled or when only pcie3a
> is enabled.
> A workaround was found that allowed the Qdrive3 to boot with both pcie2a
> and pcie3a enabled.
> Set the .probe_type to PROBE_FORCE_SYNCHRONOUS and add an msleep() to
> the probing function.
> This is not a solution, so this patch is disabling pcie2a as it seems
> Red Hat are the only ones working on the board,
> we're find with disabling the node until a root cause is found. If
> anyone has further suggestions for debugging, let me know.
> 
> Signed-off-by: Lucas Karpinski <lkarpins@...hat.com>
> ---
>  During debugging:
>         - Added additional time for clock/regulator stabilization.
>         - Reduced the bandwidth across pcie2a and pcie3a.
>         - Replaced the interconnect setup from another driver.
>         - The 32-bit/64-bit/config-io space for both pcie2a and pcie3a look to be mapped correctly.
>         - Verified interconnects were started successfully.

I was looking at another issue downstream triggering a soft lock on
CPU0, but it turns out this could be the same thing except the symptoms
are less noticeable (the 3-4 boot cycles you mention).

Using next-20230609, if I add a return kprobe on dw_handle_msi_irq:

echo 'r:dwmsi_probe dw_handle_msi_irq $retval' > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/dwmsi_probe/enable 
cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0       [000] d.h1.   690.417268: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417272: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417276: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417281: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417284: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
<idle>-0       [000] d.h1.   690.417288: dwmsi_probe: (dw_chained_msi_isr+0x38/0xb8 <- dw_handle_msi_irq) arg1=0x0
[...]

dw_handle_msi_irq constantly fires and never returns IRQ_HANDLED. It
happens consistently for pcie2a or pcie3a, after I disable one or the
other. I presume having both might be enough to overwhelm the system and
trigger the stall?

Looking at the handler, the status is always 0 after:
status = dw_pcie_readl_dbi(pci, PCIE_MSI_INTR0_STATUS +
			   (i * MSI_REG_CTRL_BLOCK_SIZE));

Unfortunately I do not know why that is yet.

> 
>  arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 44 -----------------------
>  1 file changed, 44 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index 24fa449d48a6..d492723ccf7c 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -186,27 +186,6 @@ &i2c18 {
>  	status = "okay";
>  };
>  
> -&pcie2a {
> -	ranges = <0x01000000 0x0 0x3c200000 0x0 0x3c200000 0x0 0x100000>,
> -		 <0x02000000 0x0 0x3c300000 0x0 0x3c300000 0x0 0x1d00000>,
> -		 <0x03000000 0x5 0x00000000 0x5 0x00000000 0x1 0x00000000>;
> -
> -	perst-gpios = <&tlmm 143 GPIO_ACTIVE_LOW>;
> -	wake-gpios = <&tlmm 145 GPIO_ACTIVE_HIGH>;
> -
> -	pinctrl-names = "default";
> -	pinctrl-0 = <&pcie2a_default>;
> -
> -	status = "okay";
> -};
> -
> -&pcie2a_phy {
> -	vdda-phy-supply = <&vreg_l11a>;
> -	vdda-pll-supply = <&vreg_l3a>;
> -
> -	status = "okay";
> -};
> -
>  &pcie3a {
>  	ranges = <0x01000000 0x0 0x40200000 0x0 0x40200000 0x0 0x100000>,
>  		 <0x02000000 0x0 0x40300000 0x0 0x40300000 0x0 0x20000000>,
> @@ -356,29 +335,6 @@ i2c18_default: i2c18-default-state {
>  		bias-pull-up;
>  	};
>  
> -	pcie2a_default: pcie2a-default-state {
> -		perst-pins {
> -			pins = "gpio143";
> -			function = "gpio";
> -			drive-strength = <2>;
> -			bias-pull-down;
> -		};
> -
> -		clkreq-pins {
> -			pins = "gpio142";
> -			function = "pcie2a_clkreq";
> -			drive-strength = <2>;
> -			bias-pull-up;
> -		};
> -
> -		wake-pins {
> -			pins = "gpio145";
> -			function = "gpio";
> -			drive-strength = <2>;
> -			bias-pull-up;
> -		};
> -	};
> -
>  	pcie3a_default: pcie3a-default-state {
>  		perst-pins {
>  			pins = "gpio151";
> -- 
> 2.40.1
> 

-- 
Eric Chanudet

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ