lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <135390b1-c0c5-4595-a3f3-1fb376473872@arm.com>
Date: Fri, 21 Mar 2025 16:48:46 +0000
From: Robin Murphy <robin.murphy@....com>
To: Marek Szyprowski <m.szyprowski@...sung.com>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>, Hanjun Guo
 <guohanjun@...wei.com>, Sudeep Holla <sudeep.holla@....com>,
 "Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
 Russell King <linux@...linux.org.uk>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Danilo Krummrich <dakr@...nel.org>, Stuart Yoder <stuyoder@...il.com>,
 Nipun Gupta <nipun.gupta@....com>, Nikhil Agarwal <nikhil.agarwal@....com>,
 Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
 Rob Herring <robh@...nel.org>, Saravana Kannan <saravanak@...gle.com>,
 Bjorn Helgaas <bhelgaas@...gle.com>
Cc: linux-acpi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
 devicetree@...r.kernel.org, linux-pci@...r.kernel.org,
 Charan Teja Kalla <quic_charante@...cinc.com>
Subject: Re: [PATCH v2 4/4] iommu: Get DT/ACPI parsing into the proper probe
 path

On 21/03/2025 12:15 pm, Marek Szyprowski wrote:
> On 17.03.2025 19:22, Robin Murphy wrote:
>> On 17/03/2025 7:37 am, Marek Szyprowski wrote:
>>> On 13.03.2025 15:12, Robin Murphy wrote:
>>>> On 2025-03-13 1:06 pm, Robin Murphy wrote:
>>>>> On 2025-03-13 12:23 pm, Marek Szyprowski wrote:
>>>>>> On 13.03.2025 12:01, Robin Murphy wrote:
>>>>>>> On 2025-03-13 9:56 am, Marek Szyprowski wrote:
>>>>>>> [...]
>>>>>>>> This patch landed in yesterday's linux-next as commit bcb81ac6ae3c
>>>>>>>> ("iommu: Get DT/ACPI parsing into the proper probe path"). In my
>>>>>>>> tests I
>>>>>>>> found it breaks booting of ARM64 RK3568-based Odroid-M1 board
>>>>>>>> (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts). Here is the
>>>>>>>> relevant kernel log:
>>>>>>>
>>>>>>> ...and the bug-flushing-out begins!
>>>>>>>
>>>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>>>> 00000000000003e8
>>>>>>>> Mem abort info:
>>>>>>>>        ESR = 0x0000000096000004
>>>>>>>>        EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>>>        SET = 0, FnV = 0
>>>>>>>>        EA = 0, S1PTW = 0
>>>>>>>>        FSC = 0x04: level 0 translation fault
>>>>>>>> Data abort info:
>>>>>>>>        ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>>>>        CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>>>>        GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>>>> [00000000000003e8] user address but active_mm is swapper
>>>>>>>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>>>> Modules linked in:
>>>>>>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #15533
>>>>>>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>>>>>>> pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>>> pc : devm_kmalloc+0x2c/0x114
>>>>>>>> lr : rk_iommu_of_xlate+0x30/0x90
>>>>>>>> ...
>>>>>>>> Call trace:
>>>>>>>>       devm_kmalloc+0x2c/0x114 (P)
>>>>>>>>       rk_iommu_of_xlate+0x30/0x90
>>>>>>>
>>>>>>> Yeah, looks like this is doing something a bit questionable which
>>>>>>> can't
>>>>>>> work properly. TBH the whole dma_dev thing could probably be
>>>>>>> cleaned up
>>>>>>> now that we have proper instances, but for now does this work?
>>>>>>
>>>>>> Yes, this patch fixes the problem I've observed.
>>>>>>
>>>>>> Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
>>>>>> Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
>>>>>>
>>>>>> BTW, this dma_dev idea has been borrowed from my exynos_iommu driver
>>>>>> and
>>>>>> I doubt it can be cleaned up.
>>>>>
>>>>> On the contrary I suspect they both can - it all dates back to when
>>>>> we had the single global platform bus iommu_ops and the SoC drivers
>>>>> were forced to bodge their own notion of multiple instances, but with
>>>>> the modern core code, ops are always called via a valid IOMMU
>>>>> instance or domain, so in principle it should always be possible to
>>>>> get at an appropriate IOMMU device now. IIRC it was mostly about
>>>>> allocating and DMA-mapping the pagetables in domain_alloc, where the
>>>>> private notion of instances didn't have enough information, but
>>>>> domain_alloc_paging solves that.
>>>>
>>>> Bah, in fact I think I am going to have to do that now, since although
>>>> it doesn't crash, rk_domain_alloc_paging() will also be failing for
>>>> the same reason. Time to find a PSU for the RK3399 board, I guess...
>>>>
>>>> (Or maybe just move the dma_dev assignment earlier to match Exynos?)
>>>
>>> Well I just found that Exynos IOMMU is also broken on some on my test
>>> boards. It looks that the runtime pm links are somehow not correctly
>>> established. I will try to analyze this later in the afternoon.
>>
>> Hmm, I tried to get an Odroid-XU3 up and running, but it seems unable
>> to boot my original 6.14-rc3-based branch - even with the IOMMU driver
>> disabled, it's consistently dying somewhere near (or just after) init
>> with what looks like some catastrophic memory corruption issue - very
>> occasionally it's managed to print the first line of various different
>> panics.
>>
>> Before that point though, with the IOMMU driver enabled it does appear
>> to show signs of working OK:
>>
>> [    0.649703] exynos-sysmmu 14650000.sysmmu: hardware version: 3.3
>> [    0.654220] platform 14450000.mixer: Adding to iommu group 1
>> ...
>> [    2.680920] exynos-mixer 14450000.mixer:
>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42924000
>> ...
>> [    5.196674] exynos-mixer 14450000.mixer:
>> exynos_iommu_identity_attach: Restored IOMMU to IDENTITY from pgtable
>> 0x42924000
>> [    5.207091] exynos-mixer 14450000.mixer:
>> exynos_iommu_attach_device: Attached IOMMU with pgtable 0x42884000
>>
>>
>> The multi-instance stuff in probe/release does look a bit suspect,
>> however - seems like the second instance probe would overwrite the
>> first instance's links, and then there would be a double-del() if the
>> device were ever actually released again? I may have made that much
>> more likely to happen, but I suspect it was already possible with
>> async driver probe...
> 
> That is really strange. My Odroid XU3 boots fine from commit
> bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"),
> although the IOMMU seems not to be working correctly. I've tested this
> with 14450000.mixer device (one need to attach HDMI cable to get it
> activated) and it looks that the video data are not being read from
> memory at all (the lack of VSYNC is reported, no IOMMU fault). However,
> from time to time, everything initializes and works properly.

Urgh, seems my mistake was assuming exynos_defconfig was the right thing 
to begin from - bcb81ac6ae3c with that still dies in the same way (this 
time I saw a hint of spin_bug() being hit...), however a 
multi_v7_defconfig build does get to userspace OK again with no obvious 
signs of distress:

[root@...rm ~]# grep -Hr . /sys/kernel/iommu_groups/*/type
/sys/kernel/iommu_groups/0/type:identity
/sys/kernel/iommu_groups/1/type:identity
/sys/kernel/iommu_groups/10/type:identity
/sys/kernel/iommu_groups/2/type:identity
/sys/kernel/iommu_groups/3/type:identity
/sys/kernel/iommu_groups/4/type:identity
/sys/kernel/iommu_groups/5/type:identity
/sys/kernel/iommu_groups/6/type:identity
/sys/kernel/iommu_groups/7/type:identity
/sys/kernel/iommu_groups/8/type:identity
/sys/kernel/iommu_groups/9/type:identity

Annoyingly I do have an adapter for the fiddly micro-HDMI, but it's at 
home :(

> It looks that this is somehow related to the different IOMMU/DMA-mapping
> glue code, as the other boards (ARM64 based) with exactly the same
> Exynos IOMMU driver always work fine. I've tried to figure out what
> actually happens, but so far I didn't get anything for sure. Disabling
> the call to dev->bus->dma_configure(dev) from iommu_init_device() seems
> to be fixing this, but this is almost equal to the revert of the
> $subject patch. I don't get why calling it in iommu_init_device() causes
> problems. It also doesn't look that this is anyhow related to the
> multi-instance stuff, as the same happens if I only leave a single
> exynos-sysmmu instance and its client (only 14450000.mixer device in the
> system).

On a hunch I stuck a print in exynos_iommu_probe_device(), and it looks 
like in fact device_link_add() isn't getting called at all, and indeed 
your symptoms do sound like they could be explained by the IOMMU not 
being reliably resumed... lemme stare at exynos_iommu_of_xlate() a bit 
longer...

Thanks,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ