lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Jun 2019 11:10:39 +0200
From:   Marc Gonzalez <marc.w.gonzalez@...e.fr>
To:     Bjorn Andersson <bjorn.andersson@...aro.org>,
        Kishon Vijay Abraham I <kishon@...com>
Cc:     MSM <linux-arm-msm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Evan Green <evgreen@...omium.org>,
        Vivek Gautam <vivek.gautam@...eaurora.org>,
        Niklas Cassel <niklas.cassel@...aro.org>,
        Stanimir Varbanov <svarbanov@...sol.com>
Subject: Re: [PATCH] phy: qcom-qmp: Correct READY_STATUS poll break condition

On 12/06/2019 19:25, Bjorn Andersson wrote:

> On Wed 12 Jun 09:24 PDT 2019, Marc Gonzalez wrote:
> 
>> On 05/06/2019 01:24, Bjorn Andersson wrote:
>>
>>> After issuing a PHY_START request to the QMP, the hardware documentation
>>> states that the software should wait for the PCS_READY_STATUS to become 1.
>>>
>>> With the introduction of c9b589791fc1 ("phy: qcom: Utilize UFS reset
>>> controller") an additional 1ms delay was introduced between the start
>>> request and the check of the status bit. This greatly increases the
>>> chances for the hardware to actually becoming ready before the status
>>> bit is read.
>>>
>>> The result can be seen in that UFS PHY enabling is now reported as a
>>> failure in 10% of the boots on SDM845, which is a clear regression from
>>> the previous rare/occasional failure.
>>>
>>> This patch fixes the "break condition" of the poll to check for the
>>> correct state of the status bit.
>>>
>>> Unfortunately PCIe on 8996 and 8998 does not specify the mask_pcs_ready
>>> register, which means that the code checks a bit that's always 0. So the
>>> patch also fixes these, in order to not regress these targets.
>>>
>>> Cc: stable@...r.kernel.org
>>> Cc: Evan Green <evgreen@...omium.org>
>>> Cc: Marc Gonzalez <marc.w.gonzalez@...e.fr>
>>> Cc: Vivek Gautam <vivek.gautam@...eaurora.org>
>>> Fixes: 73d7ec899bd8 ("phy: qcom-qmp: Add msm8998 PCIe QMP PHY support")
>>> Fixes: e78f3d15e115 ("phy: qcom-qmp: new qmp phy driver for qcom-chipsets")
>>> Signed-off-by: Bjorn Andersson <bjorn.andersson@...aro.org>
>>> ---
>>>
>>> @Kishon, this is a regression spotted in v5.2-rc1, so please consider applying
>>> this towards v5.2.
>>>
>>>  drivers/phy/qualcomm/phy-qcom-qmp.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> index cd91b4179b10..43abdfd0deed 100644
>>> --- a/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c
>>> @@ -1074,6 +1074,7 @@ static const struct qmp_phy_cfg msm8996_pciephy_cfg = {
>>>  
>>>  	.start_ctrl		= PCS_START | PLL_READY_GATE_EN,
>>>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
>>> +	.mask_pcs_ready		= PHYSTATUS,
>>>  	.mask_com_pcs_ready	= PCS_READY,
>>>  
>>>  	.has_phy_com_ctrl	= true,
>>> @@ -1253,6 +1254,7 @@ static const struct qmp_phy_cfg msm8998_pciephy_cfg = {
>>>  
>>>  	.start_ctrl             = SERDES_START | PCS_START,
>>>  	.pwrdn_ctrl		= SW_PWRDN | REFCLK_DRV_DSBL,
>>> +	.mask_pcs_ready		= PHYSTATUS,
>>>  	.mask_com_pcs_ready	= PCS_READY,
>>>  };
>>>  
>>> @@ -1547,7 +1549,7 @@ static int qcom_qmp_phy_enable(struct phy *phy)
>>>  	status = pcs + cfg->regs[QPHY_PCS_READY_STATUS];
>>>  	mask = cfg->mask_pcs_ready;
>>>  
>>> -	ret = readl_poll_timeout(status, val, !(val & mask), 1,
>>> +	ret = readl_poll_timeout(status, val, val & mask, 1,
>>>  				 PHY_INIT_COMPLETE_TIMEOUT);
>>>  	if (ret) {
>>>  		dev_err(qmp->dev, "phy initialization timed-out\n");
>>
>> Your patch made me realize that:
>> msm8998_pciephy_cfg.has_phy_com_ctrl = false
>> thus
>> msm8998_pciephy_cfg.mask_com_pcs_ready is useless, AFAICT.
> 
> While 8998 has a COM block, it does (among other things) not have a
> ready bit. So afaict has_phy_com_ctrl = false is correct.

Pfff... Working blind without the HPG sucks...

> The addition of mask_pcs_ready is part of resolving the regression in
> 5.2, so I suggest that we remove mask_com_pcs_ready separately.

I agree that it should be done separately.
I'll send a patch on top of yours.

>> (I copied msm8996_pciephy_cfg for msm8998_pciephy_cfg)
>>
>> Does msm8996_pciephy_cfg really need both mask_pcs_ready AND
>> mask_com_pcs_ready?
> 
> 8996 has a COM block and it contains both the control bits and the
> status bits, so that looks correct.

Thanks for checking.

>> I'll test your patch tomorrow.
> 
> I appreciate that.

Here are my observations for a 8998 board:

1) If I apply only the readl_poll_timeout() fix (not the mask_pcs_ready fixup)
qcom_pcie_probe() fails with a timeout in phy_init.
=> this is in line with your regression analysis.

2) Your patch also fixes a long-standing bug in UFS init whereby sending
lots of information to the console during phy init would lead to an
incorrectly diagnosed time-out.

Good stuff!

Reviewed-by: Marc Gonzalez <marc.w.gonzalez@...e.fr>
Tested-by: Marc Gonzalez <marc.w.gonzalez@...e.fr>

Regards.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ