lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Feb 2021 22:27:43 -0800
From:   Nicolin Chen <nicoleotsuka@...il.com>
To:     Dmitry Osipenko <digetx@...il.com>
Cc:     joro@...tes.org, thierry.reding@...il.com, will@...nel.org,
        guillaume.tucker@...labora.com, vdumpa@...dia.com,
        jonathanh@...dia.com, linux-tegra@...r.kernel.org,
        iommu@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

On Tue, Feb 23, 2021 at 08:10:41AM +0300, Dmitry Osipenko wrote:
> 23.02.2021 05:13, Nicolin Chen пишет:
> > Hi Dmitry,
> > 
> > On Sat, Feb 20, 2021 at 08:16:22AM +0300, Dmitry Osipenko wrote:
> >> 19.02.2021 01:07, Nicolin Chen пишет:
> >>> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> >>> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
> >>> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
> >>> tegra_smmu_configure() that are typically done in the IOMMU core also.
> >>>
> >>> This approach works for both existing devices that have DT nodes and other
> >>> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
> >>> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
> >>>
> >>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> >>> 	 EMEM address decode error (SMMU translation error [--S])
> >>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> >>> 	 Page fault (SMMU translation error [--S])
> >>>
> >>> After debugging, I found that the mentioned commit changed some function
> >>> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
> >>> client before display driver gets initialized. I couldn't reproduce exact
> >>> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
> >>
> >> Hello Nicolin,
> >>
> >> Could you please explain in a more details what exactly makes the
> >> difference for the callback sequence?
> > 
> > Here is a log with 5.11.0-rc6:
> > https://lava.collabora.co.uk/scheduler/job/3187849
> > [dump_stack was added in some tegra-smmu functions]
> > 
> > And here is a corresponding log with reverting the original commit:
> > https://lava.collabora.co.uk/scheduler/job/3187851
> > 
> > Here is a log with 5.11.0-rc7-next-20210210:
> > https://lava.collabora.co.uk/scheduler/job/3210245
> > 
> > And here is a corresponding log with reverting the original commit:
> > https://lava.collabora.co.uk/scheduler/job/3210596
> > 
> > Both failing logs show that mc errors started right after client DC
> > got enabled by ->attach_dev() callback that in the passing logs was
> > not called until Host1x driver init. And note that two failing logs
> > show that ->attach_dev() could be called from two different sources,
> > of_dma_configure_id() or arch_setup_dma_ops().
> > 
> > The reason why ->attach_dev() gets called is probably related to the
> > following reasons (sorry, can't be 100% sure as I don't have Tegra124
> > or other 32bit Tegra board to test):
> > 1) With the commit reverted, all clients are probed in "arch" stage,
> >    which is even prior to iommu core initialization -- including it
> >    setting default domain type. This probably messed up the type of
> >    allocating domains against the default domain type. Also internal
> >    group is somehow affected. So some condition check in iommu core
> >    failed and then it bypassed ->attach_dev callback in really_probe
> >    stage, until Host1x driver does attach_dev again.
> > 
> > 2) 32bit ARM has arch_setup_dma_ops() does an additional set of iommu
> >    domain allocation + attach_dev(), after of_dma_configure_id() did
> >    once. This isn't reproducible for me on Tegra210.
> > 
> > As debugging online isn't very efficient, and given that Thierry has
> > been working on the linear mapping of framebuffer carveout, I choose
> > to partially revert as a quick fix.
> 
> The partially revert should be okay, but it's not clear to me what makes
> difference for T124 since I don't see that problem on T30, which also
> has active display at a boot time.

Hmm..do you see ->attach_dev() is called from host1x_client_iommu_attach
or from of_dma_configure_id/arch_setup_dma_ops?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ