lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc07bd52-ed2e-0a44-80a7-36b581018b40@arm.com>
Date:   Wed, 30 Jun 2021 15:01:56 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Marek Szyprowski <m.szyprowski@...sung.com>,
        Will Deacon <will@...nel.org>
Cc:     Jean-Philippe Brucker <jean-philippe@...aro.org>,
        linux-arm-msm@...r.kernel.org, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org,
        Amey Narkhede <ameynarkhede03@...il.com>,
        Jon Hunter <jonathanh@...dia.com>,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

On 2021-06-30 14:48, Marek Szyprowski wrote:
> On 30.06.2021 14:59, Will Deacon wrote:
>> On Wed, Jun 30, 2021 at 02:48:15PM +0200, Marek Szyprowski wrote:
>>> On 08.06.2021 18:45, Amey Narkhede wrote:
>>>> If device registration fails, remove sysfs attribute
>>>> and if setting bus callbacks fails, unregister the device
>>>> and cleanup the sysfs attribute.
>>>>
>>>> Signed-off-by: Amey Narkhede <ameynarkhede03@...il.com>
>>> This patch landed in linux-next some time ago as commit 249c9dc6aa0d
>>> ("iommu/arm: Cleanup resources in case of probe error path"). After
>>> bisecting and some manual searching I finally found that it is
>>> responsible for breaking s2idle on DragonBoard 410c. Here is the log
>>> (captured with no_console_suspend):
>>>
>>> # time rtcwake -s10 -mmem
>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Thu Jan  1 00:02:13 1970
>>> PM: suspend entry (s2idle)
>>> Filesystems sync: 0.002 seconds
>>> Freezing user space processes ... (elapsed 0.006 seconds) done.
>>> OOM killer disabled.
>>> Freezing remaining freezable tasks ... (elapsed 0.004 seconds) done.
>>> Unable to handle kernel NULL pointer dereference at virtual address
>>> 0000000000000070
>>> Mem abort info:
>>>      ESR = 0x96000006
>>>      EC = 0x25: DABT (current EL), IL = 32 bits
>>>      SET = 0, FnV = 0
>>>      EA = 0, S1PTW = 0
>>>      FSC = 0x06: level 2 translation fault
>>> Data abort info:
>>>      ISV = 0, ISS = 0x00000006
>>>      CM = 0, WnR = 0
>>> user pgtable: 4k pages, 48-bit VAs, pgdp=000000008ad08000
>>> [0000000000000070] pgd=0800000085c3c003, p4d=0800000085c3c003,
>>> pud=0800000088dcf003, pmd=0000000000000000
>>> Internal error: Oops: 96000006 [#1] PREEMPT SMP
>>> Modules linked in: bluetooth ecdh_generic ecc rfkill ipv6 ax88796b
>>> venus_enc venus_dec videobuf2_dma_contig asix crct10dif_ce adv7511
>>> snd_soc_msm8916_analog qcom_spmi_temp_alarm rtc_pm8xxx qcom_pon
>>> qcom_camss qcom_spmi_vadc videobuf2_dma_sg qcom_vadc_common msm
>>> venus_core v4l2_fwnode v4l2_async snd_soc_msm8916_digital
>>> videobuf2_memops snd_soc_lpass_apq8016 snd_soc_lpass_cpu v4l2_mem2mem
>>> snd_soc_lpass_platform snd_soc_apq8016_sbc videobuf2_v4l2
>>> snd_soc_qcom_common qcom_rng videobuf2_common i2c_qcom_cci qnoc_msm8916
>>> videodev mc icc_smd_rpm mdt_loader socinfo display_connector rmtfs_mem
>>> CPU: 1 PID: 1522 Comm: rtcwake Not tainted 5.13.0-next-20210629 #3592
>>> Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
>>> pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>>> pc : msm_runtime_suspend+0x1c/0x60 [msm]
>>> lr : msm_pm_suspend+0x18/0x38 [msm]
>>> ...
>>> Call trace:
>>>     msm_runtime_suspend+0x1c/0x60 [msm]
>>>     msm_pm_suspend+0x18/0x38 [msm]
>>>     dpm_run_callback+0x84/0x378
>> I wonder if we're missing a pm_runtime_disable() call on the failure path?
>> i.e. something like the diff below...
> 
> I've checked and it doesn't fix anything.

What's happened previously? Has an IOMMU actually failed to probe, or is 
this a fiddly "code movement unveils latent bug elsewhere" kind of 
thing? There doesn't look to be much capable of going wrong in 
msm_runtime_suspend() itself, so is the DRM driver also in a broken 
half-probed state where it's left its pm_runtime_ops behind without its 
drvdata being valid?

Robin.

> 
>> Will
>>
>> --->8
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
>> index 25ed444ff94d..ce8f354755d0 100644
>> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
>> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
>> @@ -836,14 +836,14 @@ static int qcom_iommu_device_probe(struct platform_device *pdev)
>>           ret = devm_of_platform_populate(dev);
>>           if (ret) {
>>                   dev_err(dev, "Failed to populate iommu contexts\n");
>> -               return ret;
>> +               goto err_pm_disable;
>>           }
>>    
>>           ret = iommu_device_sysfs_add(&qcom_iommu->iommu, dev, NULL,
>>                                        dev_name(dev));
>>           if (ret) {
>>                   dev_err(dev, "Failed to register iommu in sysfs\n");
>> -               return ret;
>> +               goto err_pm_disable;
>>           }
>>    
>>           ret = iommu_device_register(&qcom_iommu->iommu, &qcom_iommu_ops, dev);
>> @@ -869,6 +869,9 @@ static int qcom_iommu_device_probe(struct platform_device *pdev)
>>    
>>    err_sysfs_remove:
>>           iommu_device_sysfs_remove(&qcom_iommu->iommu);
>> +
>> +err_pm_disable:
>> +       pm_runtime_disable(dev);
>>           return ret;
>>    }
>>
> Best regards
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ