[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240524145143.GB5758@thinkpad>
Date: Fri, 24 May 2024 20:21:43 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Mrinmay Sarkar <quic_msarkar@...cinc.com>
Cc: andersson@...nel.org, krzysztof.kozlowski+dt@...aro.org,
conor+dt@...nel.org, konrad.dybcio@...aro.org,
quic_shazhuss@...cinc.com, quic_nitegupt@...cinc.com,
quic_ramkri@...cinc.com, quic_nayiluri@...cinc.com,
quic_krichai@...cinc.com, quic_vbadigan@...cinc.com,
quic_schintav@...cinc.com, Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
linux-arm-msm@...r.kernel.org, devicetree@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/2] arm64: dts: qcom: sa8775p: Adding iommus property
in pcie DT nodes
On Tue, Apr 30, 2024 at 10:01:50PM +0530, Mrinmay Sarkar wrote:
> 'iommus' is a list of phandle and IOMMU specifier pairs that describe
> the IOMMU master interfaces of the device. Specified this property in
> PCIe DT nodes so that IOMMU can be used for address translation.
>
This patch description is heavily misleading. Even without the 'iommus'
property, there will be IOMMU translation because of 'iommu-map'. And I recently
got rid of 'iommus' property from all DTs because it is not really required for
the translation (it allows the host bridge to bind to IOMMU, but that's not what
we want).
This patch is intented to fix the IOMMU fault that occurs whenever the EP is
attached to the host. But you never described or even mentioned about the IOMMU
fault. Please describe the problem clearly and explain how the patch fixes that
in patch description.
Now for the IOMMU fault, I did some investigation and found that the fault is
happening due to some AER generated by the bridge whenever the device is
attached to the host. Interestingly, there was no AER IRQ received on the host.
But that can be expected due to the IOMMU fault as that could've blocked the AER
MSI from reaching the interrupt controller. And 'lspci' shows that the bridge
(even device) generated CE error (RxErr):
CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout+ AdvNonFatalErr-
But I dont' know why the IOMMU fault occurs. I also tried to manually inject the
AER errors and I saw the AER IRQs are generated correctly. So this confirms that
there is no problem with AER itself.
For experimenting, I reduced the PCIe bandwidth to Gen 2, and the above error
was gone. So this hints that there could be something wrong with the PHY.
And yes, adding the 'iommus' property indeed makes the IOMMU fault go away, but
still I can see the AER error in lspci, but no actual IRQ received (weird). So
this patch is not really _fixing_ the issue, but just masking it in some form.
Please investigate on why the RxErr is being generated and how that ended up as
an IOMMU fault instead of an IRQ.
- Mani
> Signed-off-by: Mrinmay Sarkar <quic_msarkar@...cinc.com>
> ---
> arch/arm64/boot/dts/qcom/sa8775p.dtsi | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sa8775p.dtsi b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> index 9065645..0c52180 100644
> --- a/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sa8775p.dtsi
> @@ -3666,6 +3666,7 @@
> <&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_PCIE_0 0>;
> interconnect-names = "pcie-mem", "cpu-pcie";
>
> + iommus = <&pcie_smmu 0x0000 0x7f>;
> iommu-map = <0x0 &pcie_smmu 0x0000 0x1>,
> <0x100 &pcie_smmu 0x0001 0x1>;
>
> @@ -3822,6 +3823,7 @@
> <&gem_noc MASTER_APPSS_PROC 0 &config_noc SLAVE_PCIE_1 0>;
> interconnect-names = "pcie-mem", "cpu-pcie";
>
> + iommus = <&pcie_smmu 0x0080 0x7f>;
> iommu-map = <0x0 &pcie_smmu 0x0080 0x1>,
> <0x100 &pcie_smmu 0x0081 0x1>;
>
> --
> 2.7.4
>
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists