lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e2ad90b-9e48-4711-a8da-85668493259b@arm.com>
Date: Thu, 10 Apr 2025 12:14:48 +0100
From: Robin Murphy <robin.murphy@....com>
To: Chen-Yu Tsai <wenst@...omium.org>
Cc: Joerg Roedel <joro@...tes.org>,
 Louis-Alexis Eyraud <louisalexis.eyraud@...labora.com>,
 Yong Wu <yong.wu@...iatek.com>, Will Deacon <will@...nel.org>,
 Matthias Brugger <matthias.bgg@...il.com>,
 AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
 Bjorn Helgaas <bhelgaas@...gle.com>,
 Lorenzo Pieralisi <lpieralisi@...nel.org>,
 "Rob Herring (Arm)" <robh@...nel.org>, kernel@...labora.com,
 Joerg Roedel <jroedel@...e.de>, Jason Gunthorpe <jgg@...pe.ca>,
 iommu@...ts.linux.dev, linux-mediatek@...ts.infradead.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v2] iommu/mediatek: Fix NULL pointer deference in
 mtk_iommu_device_group

On 08/04/2025 5:46 am, Chen-Yu Tsai wrote:
> On Mon, Apr 7, 2025 at 8:38 PM Robin Murphy <robin.murphy@....com> wrote:
>>
>> On 2025-04-07 6:17 am, Chen-Yu Tsai wrote:
>>> Hi,
>>>
>>> On Thu, Apr 3, 2025 at 6:24 PM Louis-Alexis Eyraud
>>> <louisalexis.eyraud@...labora.com> wrote:
>>>>
>>>> Currently, mtk_iommu calls during probe iommu_device_register before
>>>> the hw_list from driver data is initialized. Since iommu probing issue
>>>> fix, it leads to NULL pointer dereference in mtk_iommu_device_group when
>>>> hw_list is accessed with list_first_entry (not null safe).
>>>>
>>>> So, change the call order to ensure iommu_device_register is called
>>>> after the driver data are initialized.
>>>>
>>>> Fixes: 9e3a2a643653 ("iommu/mediatek: Adapt sharing and non-sharing pgtable case")
>>>> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
>>>> Reviewed-by: Yong Wu <yong.wu@...iatek.com>
>>>> Tested-by: Chen-Yu Tsai <wenst@...omium.org> # MT8183 Juniper, MT8186 Tentacruel
>>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
>>>> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>
>>>> Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@...labora.com>
>>>> ---
>>>> This patch fixes a NULL pointer dereference that occurs during the
>>>> mtk_iommu driver probe and observed at least on several Mediatek Genio boards:
>>>> ```
>>>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>>>
>>> This is a reminder to please land this and send to Linus ASAP.
>>>
>>> This fixes the v6.15-rc1 kernel on all the MediaTek Chromebook platforms,
>>> except for MT8188, which seems to have another issue in iommu_get_dma_domain()
>>> used from the DRM driver:
>>>
>>>       Disabling lock debugging due to kernel taint
>>>       Unable to handle kernel NULL pointer dereference at virtual
>>> address 0000000000000158
>>
>>   From the offset and the stacktrace code dump, this would appear to be
>> the dereference of dev->iommu_group->default_domain, when
>> dev->iommu_group is NULL (and CONFIG_DEBUG_LOCK_ALLOC makes the mutex
>> really big). Which is a bit weird, as to get into iommu-dma at all in
>> that state would suggest that whatever device this is has been removed
>> and had its group torn down again after iommu_setup_dma_ops() has run...
>> but either way that implies the DRM driver is passing an arbitrary
>> device to the DMA API without making sure it's actualy valid.
>>
>> Trying to trace the provenance of dma_dev from mtk_gem_create() back
>> through the rest of the driver is quite the rabbit-hole, but it seems
>> like in at least one case it can lead back to an
>> of_find_device_by_node() in ovl_adaptor_comp_init(), which definitely
>> looks sufficiently sketchy.
> 
> It kind of makes sense since the "display controller" is composed of
> many individual hardware blocks. The struct device tied to the DRM
> driver is more or less just a place holder. Only the first block,
> either the OVL (overlay compositing engine) or RDMA (scanout engine)
> accesses memory, so I think it makes sense to use that as the dma_dev.

I'm not disputing whether the choice of dma_dev is semantically 
appropriate, I'm just saying that the method of pulling a struct device 
reference out of the DT topology shows no *obvious* guarantee that that 
specific device will have a driver bound and be validly configured 
before that dma_dev can be used via any other path. Especially when it's 
all happening off the back of another fake device created by the fake 
DRM device itself.

Maybe there's some hidden magic in all the component stuff which makes 
it work out fine, I don't know. This was just an observation since I 
went looking for potential bugs and found something which at first 
glance *looks* rather fragile, compared to, say, if mtk_mdp_rdma's own 
probe() or bind() were to directly set itself as the DMA device.

> With some more logs, I did find something else fishy. Here the IOMMU
> for the second display pipeline fails to probe:
> 
>      mtk-iommu 1c028000.iommu: error -EINVAL: Failed to register IOMMU
>      mtk-iommu 1c028000.iommu: probe with driver mtk-iommu failed with error -22
> 
> Then later on, deferred probe times out, and the display pipeline is
> brought up regardless:
> 
>      mediatek-disp-ovl 1c000000.ovl: deferred probe timeout, ignoring dependency
>      mediatek-disp-ovl 1c000000.ovl: Adding to IOMMU failed: -110
>      mediatek-disp-rdma 1c002000.rdma: deferred probe timeout, ignoring
> dependency
>      mediatek-disp-rdma 1c002000.rdma: Adding to IOMMU failed: -110
>      (repeats for all the individual components of the display pipeline)
>      mediatek-drm mediatek-drm.16.auto: bound 1c000000.ovl (ops
> mtk_disp_ovl_component_ops)
>      mediatek-drm mediatek-drm.16.auto: bound 1c002000.rdma (ops
> mtk_disp_rdma_component_ops)
>      (repeats for all the individual components of the display pipeline)
>      mediatek-drm mediatek-drm.16.auto: DMA device is 1c000000.ovl
>      [drm] Initialized mediatek 1.0.0 for mediatek-drm.16.auto on minor 1
> 
> And all without a functional IOMMU.

Ah, that I was not expecting, and it does indeed explain how things get 
into that state.

> So I think this brings up two more questions:
> 
> 1. Why is the IOMMU failing to probe?
> 2. Why is the core code still going the IOMMU DMA alloc path if there
>     is no usable IOMMU?
> 
> I'll look into the first question first. Insights welcome for the second
> one.

I had a think about it, and although there's very much a historical mess 
in this area, I'm inclined to agree that we could do better. Patch sent.

Cheers,
Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ