lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211117170013.otj3gwfqnmvtlxlu@maple.lan>
Date:   Wed, 17 Nov 2021 17:00:13 +0000
From:   Daniel Thompson <daniel.thompson@...aro.org>
To:     Laurentiu Tudor <laurentiu.tudor@....com>
Cc:     Jon Nettleton <jon@...id-run.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Diana Madalina Craciun <diana.craciun@....com>,
        Ioana Ciornei <ioana.ciornei@....com>, leoyang.li@....com
Subject: Re: [PATCH 2/8] bus: fsl-mc: handle DMA config deferral in ACPI case

On Wed, Nov 17, 2021 at 05:30:32PM +0200, Laurentiu Tudor wrote:
> On 11/17/2021 3:59 PM, Daniel Thompson wrote:
> > On Wed, Nov 17, 2021 at 03:07:51PM +0200, Laurentiu Tudor wrote:
> >> On 11/12/2021 7:31 PM, Daniel Thompson wrote:
> >>> On Thu, Nov 11, 2021 at 06:36:58PM +0100, Jon Nettleton wrote:
> >>>> On Thu, Nov 11, 2021 at 6:23 PM Daniel Thompson
> >>>> <daniel.thompson@...aro.org> wrote:
> >>>> The correct solution for the problem you are seeing is the ACPI
> >>>> maintainers figuring out how to land the IORT RMR patchset.  Until
> >>>> that is done the only workaround is setting "arm-smmu.disable_bypass=0
> >>>> iommu.passthrough=1" on the kernel commandline.  The latter option is
> >>>> required since 5.15 and I haven't had time or energy to figure out
> >>>> why.  The proper solution is to just land the IORT RMR patchset and
> >>>> let HoneyComb run with the SMMU enabled.
> >>>
> >>> Thanks for the update. I'll probably adopt iommu.passthrough=1 for now.
> >>> That allows me to adopt a distro kernel when it updates to v5.15.
> >>
> >> The "iommu.passthrough=1" kernel arg shouldn't be needed. By chance, do
> >> you remember what errors were you seeing? What was failing?
> > 
> > For all testing of v5.15 I had "arm-smmu.disable_bypass=0" set because I
> > was guided to enable that by the error messages in older kernels ;-) .
> > 
> > Anyhow without "iommu.passthrough=1" (and without the patch from this thread
> > reverted) then the logs are being massively spammed with error messages:
> > 
> > ~~~
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm-smmu arm-smmu.0.auto: Unhandled context fault: fsr=0x402, iova=0x23e0000100, fsynr=0x20040, cbfrsynra=0x4000, cb=0
> > arm_smmu_context_fault: 1697259 callbacks suppressed
> > ~~~
> > 
> > This results a relatively simple workstation (LX2 + nVidia GT-710 + USB
> > for networking) becoming unresponsive. How long to fail is a little
> > unpredictable. I assumed that the weight of such dense log messages
> > eventually gets into a timing pattern that prevented any useful
> > interrupts from being serviced... but that is only a guess.
> > 
> 
> Few comments here:
>  - I'm suspecting that the PCI video card is triggering the smmu faults.
> Would it be possible to give it a try with the card out and without
> "iommu.passthrough=1"?

The PCIe video card does not cause the smmu faults. These still manifest
when the card is removed (and with same IOVA).


>  - the IOVAs look weird to me, they should look something like
> 0xffffxxxxxx or so. Maybe there are issues in the nvidia driver?

I guess there could be, but why would a problem that bisects down to
a change in the fsl-mc-bus initialization configuration alter the
behaviour of the PCIe graphics driver?


>  - Would it be possible to share a full boot log? I'm thinking that it
> would be interesting to see how the devices are allocated in iommu groups.

See
https://gist.github.com/daniel-thompson/07489561f14965fd1af7d5bd4340f54b

It contains three files, all gathered with the GPU removed:

 * Logs from unmodified v5.15 with iommu.passthrough=1 set
   (networking is good).
 * Logs from v5.15 patched with the revert I shared earlier in
   the thread (networking is good).
 * Logs from v5.15 without iommu.passthough=1 set (many SMMU messages,
   networking is broken).


Daniel.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ