linux-kernel - Re: [PATCH 3/4] memory: mtk-smi: Add sleep ctrl function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47c30f6f-ce9d-9ea7-283c-9026ae9ed1c0@collabora.com>
Date:   Tue, 7 Dec 2021 09:56:59 +0100
From:   AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>
To:     Yong Wu <yong.wu@...iatek.com>,
        Krzysztof Kozlowski <krzysztof.kozlowski@...onical.com>
Cc:     Krzysztof Kozlowski <krzk@...nel.org>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>,
        Tomasz Figa <tfiga@...omium.org>,
        linux-mediatek@...ts.infradead.org, srv_heupstream@...iatek.com,
        linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        iommu@...ts.linux-foundation.org, youlin.pei@...iatek.com,
        anan.sun@...iatek.com, lc.kan@...iatek.com, yi.kuo@...iatek.com,
        anthony.huang@...iatek.com, Rob Herring <robh+dt@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>
Subject: Re: [PATCH 3/4] memory: mtk-smi: Add sleep ctrl function

Il 07/12/21 07:24, Yong Wu ha scritto:
> Hi AngeloGioacchino,
> 
> Thanks for your review.
> 
> On Mon, 2021-12-06 at 16:08 +0100, AngeloGioacchino Del Regno wrote:
>> Il 03/12/21 07:40, Yong Wu ha scritto:
>>> sleep control means that when the larb go to sleep, we should wait
>>> a bit
>>> until all the current commands are finished. thus, when the larb
>>> runtime
>>> suspend, we need enable this function to wait until all the existed
>>> command are finished. when the larb resume, just disable this
>>> function.
>>> This function only improve the safe of bus. Add a new flag for this
>>> function. Prepare for mt8186.
>>>
>>> Signed-off-by: Anan Sun <anan.sun@...iatek.com>
>>> Signed-off-by: Yong Wu <yong.wu@...iatek.com>
>>> ---
>>>    drivers/memory/mtk-smi.c | 39
>>> +++++++++++++++++++++++++++++++++++----
>>>    1 file changed, 35 insertions(+), 4 deletions(-)
> 
> [snip]
> 
>>>    static int __maybe_unused mtk_smi_larb_suspend(struct device
>>> *dev)
>>>    {
>>>    	struct mtk_smi_larb *larb = dev_get_drvdata(dev);
>>> +	int ret = 0;
>>> +
>>> +	if (MTK_SMI_CAPS(larb->larb_gen->flags_general,
>>> MTK_SMI_FLAG_SLEEP_CTL))
>>> +		ret = mtk_smi_larb_sleep_ctrl(dev, true);
>>
>> Sorry but what happens if SLP_PROT_RDY is not getting set properly?
>>   From what I can understand in the commit description that you wrote,
>> if we reach
>> the timeout, then the LARB transactions are not over....
>>
>> I see that you are indeed returning a failure here, but you are also
>> turning off
>> the clocks regardless of whether we get a failure or a success; I'm
>> not sure that
>> this is right, as this may leave the hardware in an unpredictable
>> state (since
>> there were some more LARB transactions that didn't go through),
>> leading to crashes
>> at system resume (or when retyring to suspend).
> 
> Thanks for this question. In theory you are right. In this case, the
> bus already hang.
> 
> We only printed a fail log in this patch. If this fail happens, we
> should request the master to check which case cause the larb hang.
> 
> If the master has a good reason or limitation, the hang is expected, I
> think we have to add larb reset in this fail case: Reset the larb when
> the larb runtime resume.
> 

Think about the case in which the system gets resumed only partially due to a

failure during resume of some driver, or due to a RTC or arch timer resume and
suspend right after... or perhaps during runtime suspend/resume of some devices.
In that case, we definitely want to avoid any kind of failure point that would
lead to a system crash, or any kind of user noticeable (or UX disrupting) "strange
behavior".

I think that we should make sure that the system suspends cleanly, instead of
patching up any possible leftover issue at resume time: if this is doable with
a LARB reset in suspend error case, that looks like being a good option indeed.

As a side note, thinking about UX, losing a little more time during suspend is
nothing really noticeable for the user... on the other hand, spending more time
during resume may be something noticeable to the user.
For this reason, I think that guaranteeing that the system resumes as fast as
possible is very important, which adds up to the need of suspending cleanly.

> Fortunately, we have never got this issue. We could add this reset when
> necessary. Is this OK for you?
> 
> Thanks.
> 
>>
>>>    
>>>    	clk_bulk_disable_unprepare(larb->smi.clk_num, larb->smi.clks);
>>> -	return 0;
>>> +	return ret;
>>>    }
>>>    
>>>    static const struct dev_pm_ops smi_larb_pm_ops = {
>>>
>>
>>