linux-kernel - Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a72b0a47-9de9-ebb9-e0eb-70e3bb20942a@samsung.com>
Date:   Thu, 1 Jul 2021 11:26:45 +0200
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     Robin Murphy <robin.murphy@....com>, Will Deacon <will@...nel.org>
Cc:     Jean-Philippe Brucker <jean-philippe@...aro.org>,
        linux-arm-msm@...r.kernel.org, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org,
        Amey Narkhede <ameynarkhede03@...il.com>,
        Jon Hunter <jonathanh@...dia.com>,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] iommu/arm: Cleanup resources in case of probe error
 path

On 01.07.2021 11:11, Robin Murphy wrote:
> On 2021-07-01 10:01, Will Deacon wrote:
>> On Thu, Jul 01, 2021 at 10:29:29AM +0200, Marek Szyprowski wrote:
>>> Hi Robin,
>>>
>>> On 30.06.2021 16:01, Robin Murphy wrote:
>>>> On 2021-06-30 14:48, Marek Szyprowski wrote:
>>>>> On 30.06.2021 14:59, Will Deacon wrote:
>>>>>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>>>>>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>>>>>>> If device registration fails, remove sysfs attribute
>>>>>>>> and if setting bus callbacks fails, unregister the device
>>>>>>>> and cleanup the sysfs attribute.
>>>>>>>>
>>>>>>>> Signed-off-by: Amey Narkhede <ameynarkhede03@...il.com>
>>>>>>> This patch landed in linux-next some time ago as commit 
>>>>>>> 249c9dc6aa0d
>>>>>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>>>>>> bisecting and some manual searching I finally found that it is
>>>>>>> responsible for breaking s2idle on DragonBoard 410c. Here is the 
>>>>>>> log
>>>>>>> (captured with no_console_suspend):
>>>>>>>
>>>>>>> # time rtcwake -s10 -mmem
>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Janï¿½ 1 
>>>>>>> 00:02:13 1970
>>>>>>> PM: suspend entry (s2idle)
>>>>>>> Filesystems sync: 0.002 seconds
>>>>>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>>>>>> OOM killer disabled.
>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) 
>>>>>>> done.
>>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>>> 0000000000000070
>>>>>>> Mem abort info:
>>>>>>> ï¿½ï¿½ ï¿½ ESR = 0x96000006
>>>>>>> ï¿½ï¿½ ï¿½ EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>> ï¿½ï¿½ ï¿½ SET = 0, FnV = 0
>>>>>>> ï¿½ï¿½ ï¿½ EA = 0, S1PTW = 0
>>>>>>> ï¿½ï¿½ ï¿½ FSC = 0x06: level 2 translation fault
>>>>>>> Data abort info:
>>>>>>> ï¿½ï¿½ ï¿½ ISV = 0, ISS = 0x00000006
>>>>>>> ï¿½ï¿½ ï¿½ CM = 0, WnR = 0
>>>>>>> user pgtable: 4k pages, 48-bit VAs, pgdp=000000008ad08000
>>>>>>> [0000000000000070] pgd=0800000085c3c003, p4d=0800000085c3c003,
>>>>>>> pud=0800000088dcf003, pmd=0000000000000000
>>>>>>> Internal error: Oops: 96000006 [#1] PREEMPT SMP
>>>>>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>>>>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>>>>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>>>>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>>>>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>>>>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu 
>>>>>>> v4l2_mem2mem
>>>>>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>>>>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci
>>>>>>> qnoc_msm8916
>>>>>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector 
>>>>>>> rmtfs_mem
>>>>>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 
>>>>>>> #3592
>>>>>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>>>>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>>>>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>>>>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>>>>>> ...
>>>>>>> Call trace:
>>>>>>> ï¿½ï¿½ ï¿½msm_runtime_suspend+0x1c/0x60 [msm]
>>>>>>> ï¿½ï¿½ ï¿½msm_pm_suspend+0x18/0x38 [msm]
>>>>>>> ï¿½ï¿½ ï¿½dpm_run_callback+0x84/0x378
>>>>>> I wonder if we're missing a pm_runtime_disable() call on the failure
>>>>>> path?
>>>>>> i.e. something like the diff below...
>>>>>
>>>>> I've checked and it doesn't fix anything.
>>>>
>>>> What's happened previously? Has an IOMMU actually failed to probe, or
>>>> is this a fiddly "code movement unveils latent bug elsewhere" kind of
>>>> thing? There doesn't look to be much capable of going wrong in
>>>> msm_runtime_suspend() itself, so is the DRM driver also in a broken
>>>> half-probed state where it's left its pm_runtime_ops behind without
>>>> its drvdata being valid?
>>>>
>>> I finally had some time to analyze this issue. It turned out that with
>>> this patch, iommu fails to probe for soc:iommu@...8000 device, while it
>>> worked fine before. This happens because this patch adds a check for 
>>> the
>>> return value of the bus_set_iommu() in
>>> drivers/iommu/arm/arm-smmu/qcom_iommu.c. When I removed that check, it
>>> probes successfully again. It looks that there are already iommu ops
>>> registered for platform bus, before qcom_iommu probes. On the other
>>> hand, if I remember correctly they are not used during the device
>>> registration, but they are needed for some legacy stuff. I can send a
>>> patch restoring old code flow if you think that this is a right 
>>> solution.
>>
>> Yes, let's just revert the qcom_iommu.c changes from that patch for now.
>> The pm runtime stuff looks dodgy anyway so I think this needs more 
>> thought.
>
> Oh, right, blindly returning the -EBUSY from bus_set_iommu() because 
> we're not the first instance to probe is definitely the wrong thing to 
> do as well. It's still not clear why failing makes the DRM driver fall 
> over, but +1 to qcom-iommu needing some deeper consideration.

I've just checked and bus_set_iommu() is called for every 
'qcom,msm-iommu-v1' device in the system, thus it fails for the second 
and next devices.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland