[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6cd5e64-e2c0-4c6e-9c89-ce8b3e0a4a5b@arm.com>
Date: Mon, 17 Mar 2025 18:22:32 +0000
From: Robin Murphy <robin.murphy@....com>
To: Marek Szyprowski <m.szyprowski@...sung.com>,
Lorenzo Pieralisi <lpieralisi@...nel.org>, Hanjun Guo
<guohanjun@...wei.com>, Sudeep Holla <sudeep.holla@....com>,
"Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
Russell King <linux@...linux.org.uk>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>, Stuart Yoder <stuyoder@...il.com>,
Laurentiu Tudor <laurentiu.tudor@....com>, Nipun Gupta
<nipun.gupta@....com>, Nikhil Agarwal <nikhil.agarwal@....com>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Rob Herring <robh@...nel.org>, Saravana Kannan <saravanak@...gle.com>,
Bjorn Helgaas <bhelgaas@...gle.com>
Cc: linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
devicetree@...r.kernel.org, linux-pci@...r.kernel.org,
Charan Teja Kalla <quic_charante@...cinc.com>
Subject: Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe
path
On 17/03/2025 7:37 am, Marek Szyprowski wrote:
> On 13.03.2025 15:12, Robin Murphy wrote:
>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>> [...]
>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>> tests I
>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>> relevant kernel log:
>>>>>
>>>>> ...and the bug-flushing-out begins!
>>>>>
>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>> 00000000000003e8
>>>>>> Mem abort info:
>>>>>> ESR = 0x0000000096000004
>>>>>> EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> SET = 0, FnV = 0
>>>>>> EA = 0, S1PTW = 0
>>>>>> FSC = 0x04: level 0 translation fault
>>>>>> Data abort info:
>>>>>> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>> Modules linked in:
>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>> ...
>>>>>> Call trace:
>>>>>> devm_kmalloc+0x2c/0x114 (P)
>>>>>> rk_iommu_of_xlate+0x30/0x90
>>>>>
>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>> can't
>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>> cleaned up
>>>>> now that we have proper instances, but for now does this work?
>>>>
>>>> Yes, this patch fixes the problem I've observed.
>>>>
>>>> Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
>>>> Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
>>>>
>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
>>>> and
>>>> I doubt it can be cleaned up.
>>>
>>> On the contrary I suspect they both can - it all dates back to when
>>> we had the single global platform bus iommu_ops and the SoC drivers
>>> were forced to bodge their own notion of multiple instances, but with
>>> the modern core code, ops are always called via a valid IOMMU
>>> instance or domain, so in principle it should always be possible to
>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>> private notion of instances didn't have enough information, but
>>> domain_alloc_paging solves that.
>>
>> Bah, in fact I think I am going to have to do that now, since although
>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>
>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
>
> Well I just found that Exynos IOMMU is also broken on some on my test
> boards. It looks that the runtime pm links are somehow not correctly
> established. I will try to analyze this later in the afternoon.
Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable to
boot my original 6.14-rc3-based branch - even with the IOMMU driver
disabled, it's consistently dying somewhere near (or just after) init
with what looks like some catastrophic memory corruption issue - very
occasionally it's managed to print the first line of various different
panics.
Before that point though, with the IOMMU driver enabled it does appear
to show signs of working OK:
[ 0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
[ 0.654220] platform 14450000.mixer: Adding to iommu group 1
...
[ 2.680920] exynos-mixer 14450000.mixer: exynos_iommu_attach_device:
Attached IOMMU with pgtable 0x42924000
...
[ 5.196674] exynos-mixer 14450000.mixer:
exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable
0x42924000
[ 5.207091] exynos-mixer 14450000.mixer: exynos_iommu_attach_device:
Attached IOMMU with pgtable 0x42884000
The multi-instance stuff in probe/release does look a bit suspect,
however - seems like the second instance probe would overwrite the first
instance's links, and then there would be a double-del() if the device
were ever actually released again? I may have made that much more likely
to happen, but I suspect it was already possible with async driver probe...
Thanks,
Robin.
Powered by blists - more mailing lists