lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e75f24e7-90a9-5673-cf9a-a1f186193dfc@nxp.com>
Date:   Thu, 18 Nov 2021 14:41:50 +0200
From:   Laurentiu Tudor <laurentiu.tudor@....com>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Jon Nettleton <jon@...id-run.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Diana Madalina Craciun <diana.craciun@....com>,
        Ioana Ciornei <ioana.ciornei@....com>, leoyang.li@....com
Subject: Re: [PATCH 2/8] bus: fsl-mc: handle DMA config deferral in ACPI case



On 11/17/2021 7:00 PM, Daniel Thompson wrote:
> On Wed, Nov 17, 2021 at 05:30:32PM +0200, Laurentiu Tudor wrote:
>> On 11/17/2021 3:59 PM, Daniel Thompson wrote:
>>> On Wed, Nov 17, 2021 at 03:07:51PM +0200, Laurentiu Tudor wrote:
>>>> On 11/12/2021 7:31 PM, Daniel Thompson wrote:
>>>>> On Thu, Nov 11, 2021 at 06:36:58PM +0100, Jon Nettleton wrote:
>>>>>> On Thu, Nov 11, 2021 at 6:23 PM Daniel Thompson
>>>>>> <daniel.thompson@...aro.org> wrote:
>>>>>> The correct solution for the problem you are seeing is the ACPI
>>>>>> maintainers figuring out how to land the IORT RMR patchset.  Until
>>>>>> that is done the only workaround is setting "arm-smmu.disable_bypass=0
>>>>>> iommu.passthrough=1" on the kernel commandline.  The latter option is
>>>>>> required since 5.15 and I haven't had time or energy to figure out
>>>>>> why.  The proper solution is to just land the IORT RMR patchset and
>>>>>> let HoneyComb run with the SMMU enabled.
>>>>>
>>>>> Thanks for the update. I'll probably adopt iommu.passthrough=1 for now.
>>>>> That allows me to adopt a distro kernel when it updates to v5.15.
>>>>
>>>> The "iommu.passthrough=1" kernel arg shouldn't be needed. By chance, do
>>>> you remember what errors were you seeing? What was failing?
>>>
>>> For all testing of v5.15 I had "arm-smmu.disable_bypass=0" set because I
>>> was guided to enable that by the error messages in older kernels ;-) .
>>>
>>> Anyhow without "iommu.passthrough=1" (and without the patch from this thread
>>> reverted) then the logs are being massively spammed with error messages:
>>>
>>> ~~~
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
>>> arm_smmu_context_fault: 1697259 callbacks suppressed
>>> ~~~
>>>
>>> This results a relatively simple workstation (LX2 + nVidia GT-710 + USB
>>> for networking) becoming unresponsive. How long to fail is a little
>>> unpredictable. I assumed that the weight of such dense log messages
>>> eventually gets into a timing pattern that prevented any useful
>>> interrupts from being serviced... but that is only a guess.
>>>
>>
>> Few comments here:
>>  - I'm suspecting that the PCI video card is triggering the smmu faults.
>> Would it be possible to give it a try with the card out and without
>> "iommu.passthrough=1"?
> 
> The PCIe video card does not cause the smmu faults. These still manifest
> when the card is removed (and with same IOVA).
> 
> 
>>  - the IOVAs look weird to me, they should look something like
>> 0xffffxxxxxx or so. Maybe there are issues in the nvidia driver?
> 
> I guess there could be, but why would a problem that bisects down to
> a change in the fsl-mc-bus initialization configuration alter the
> behaviour of the PCIe graphics driver?
> 
> 
>>  - Would it be possible to share a full boot log? I'm thinking that it
>> would be interesting to see how the devices are allocated in iommu groups.
> 
> See
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fdaniel-thompson%2F07489561f14965fd1af7d5bd4340f54b&amp;data=04%7C01%7Claurentiu.tudor%40nxp.com%7Cea1a5bd1614a4fc6c71f08d9a9ebbb15%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637727652186934191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=gYsxsm20NsCKKbSXWPentLAJJPAn6A9hEh3fAKBn2Kw%3D&amp;reserved=0
> 
> It contains three files, all gathered with the GPU removed:
> 
>  * Logs from unmodified v5.15 with iommu.passthrough=1 set
>    (networking is good).
>  * Logs from v5.15 patched with the revert I shared earlier in
>    the thread (networking is good).
>  * Logs from v5.15 without iommu.passthough=1 set (many SMMU messages,
>    networking is broken).
> 

Ok, it appears there was some confusion on my side, sorry about it.
So, to summarize:
 - the "arm-smmu.disable_bypass=0" workaround is not enough in the ACPI
scenario but works for DT based boot
 - the result of reverting the patch is that the IOMMU for MC is no
longer configured (MC device does not get configured in SMMU) leading to
"arm-smmu.disable_bypass=0" being sufficient
 - for ACPI too boot without "iommu.passthrough=1" the IORT RMR patches
are required

---
Best Regards, Laurentiu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ