linux-kernel - Re: [PATCH v3 2/2] PCI: Rockchip: Add Rockchip PCIe controller support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a18b7027-5627-7827-ce28-059cc5819d08@rock-chips.com>
Date:	Thu, 23 Jun 2016 10:15:04 +0800
From:	Shawn Lin <shawn.lin@...k-chips.com>
To:	Brian Norris <briannorris@...omium.org>
Cc:	shawn.lin@...k-chips.com, Bjorn Helgaas <bhelgaas@...gle.com>,
	devicetree@...r.kernel.org, Heiko Stuebner <heiko@...ech.de>,
	Arnd Bergmann <arnd@...db.de>,
	Marc Zyngier <marc.zyngier@....com>, linux-pci@...r.kernel.org,
	Wenrui Li <wenrui.li@...k-chips.com>,
	linux-kernel@...r.kernel.org,
	Doug Anderson <dianders@...omium.org>,
	linux-rockchip@...ts.infradead.org,
	Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH v3 2/2] PCI: Rockchip: Add Rockchip PCIe controller
 support

在 2016/6/23 9:35, Brian Norris 写道:
> Hi,
>
> On Thu, Jun 23, 2016 at 09:09:46AM +0800, Shawn Lin wrote:
>> 在 2016/6/23 8:29, Brian Norris 写道:
>>> On Thu, Jun 16, 2016 at 09:50:35AM +0800, Shawn Lin wrote:
>
> [...]
>
>>>> +	/* 500ms timeout value should be enough for gen1/2 taining */
>>>> +	timeout = jiffies + msecs_to_jiffies(500);
>>>> +
>>>> +	err = -ETIMEDOUT;
>>>> +	while (time_before(jiffies, timeout)) {
>>>> +		status = pcie_read(port, PCIE_CLIENT_BASIC_STATUS1);
>>>> +		if (((status >> PCIE_CLIENT_LINK_STATUS_SHIFT) &
>>>> +		      PCIE_CLIENT_LINK_STATUS_MASK) ==
>>>> +		      PCIE_CLIENT_LINK_STATUS_UP) {
>>>> +			dev_dbg(port->dev, "pcie link training gen1 pass!\n");
>>>> +			err = 0;
>>>> +			break;
>>>> +		}
>>>> +		msleep(20);
>>>> +	}
>>>
>>> Technically, the above timeout loop is not quite correct. Possible error
>>> case: we can fail with a timeout after 500 ms if the training completes
>>> between the 480-500 ms time window. This can happen because you're
>>> doing:
>>>
>>> (1) read register: if complete, then terminate successfully
>>> (2) delay
>>> (3) check for timeout: if time is up, return error
>>>
>>> You actually need:
>>>
>>> (1) read register: if complete, then terminate successfully
>>> (2) delay
>>> (3) check for timeout: if time is up, repeat (1), and then report error
>>>
>>> You can examine the logic for readx_poll_timeout() in
>>> include/linux/iopoll.h to see an example of a proper timeout loop. You
>>> could even try to use one of the helpers there, except that your
>>> pcie_read() takes 2 args.
>>
>> I see, thanks.
>>
>>>
>>>> +	if (err) {
>>>> +		dev_err(port->dev, "pcie link training gen1 timeout!\n");
>>>> +		return err;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Enable retrain for gen2. This should be configured only after
>>>> +	 * gen1 finished.
>>>> +	 */
>>>> +	status = pcie_read(port,
>>>> +			   PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE);
>>>> +	status |= PCIE_CORE_LCSR_RETAIN_LINK;
>>>> +	pcie_write(port, status,
>>>> +		   PCIR_RC_CONFIG_LCS + PCIE_RC_CONFIG_BASE);
>>>
>>> I'm not really an expert on this, but how are you actually "retraining
>>> for gen2"? Is that just the behavior of the core, that it retries at the
>>> higher rate on the 2nd training attempt? I'm just confused, since you
>>> set PCIE_CLIENT_GEN_SEL_2 above, so you've already allowed either gen1
>>> or gen2 negotiation AFAICT, and so seemingly you might already have
>>> negotiated at gen2.
>>
>>
>> Not really. I allow the core to enable gen2, but it needs a extra
>> setting of retrain after finishing gen1. It's not so strange as it
>> depends on the vendor's design. So I have to say it fits the
>> designer's expectation.
>
> OK.
>
>>>> +	err = -ETIMEDOUT;
>>>> +
>>>> +	while (time_before(jiffies, timeout)) {
>>>
>>> You never reset 'timeout' when starting this loop. So you only have a
>>> cumulative 500 ms to do both the gen1 and gen2 loops. Is that
>>> intentional? (I feel like this comment was made on an earlier revision
>>> of this driver, though I haven't read everything thoroughly.)
>>
>> yes, I don't have any docs to let me know how long should I wait for
>> gen1/2 to be finished. Maybe someday it will be augmented to a larger
>> value if finding a device actually need a longer time. But the only
>> thing I can say is that it's from my test for many pcie devices
>> currently.
>>
>>
>> Do you agree?
>
> I'm not suggesting increasing the timeout, exactly; I'm suggesting that
> you should set some minimum timeout for each training loop, instead of
> reusing the same deadline. i.e., something like this before the second
> loop:
>
> 	/* Reset the deadline for gen2 */
> 	timeout = jiffies + msecs_to_jiffies(500);


ok, I will update it.

>
> As it stands, if the first loop takes 490 ms, that leaves you only 10 ms
> for the second loop. And I think it'd be confusing if we ever see the
> first loop take a "long" time, and then we time out on the second --
> we'd be blaming the gen2 training for gen1's slowness.
>
>>>> +		status = pcie_read(port, PCIE_CORE_CTRL_MGMT_BASE);
>>>> +		if (((status >> PCIE_CORE_PL_CONF_SPEED_SHIFT) &
>>>> +		      PCIE_CORE_PL_CONF_SPEED_MASK) ==
>>>> +		      PCIE_CORE_PL_CONF_SPEED_50G) {
>>>> +			dev_dbg(port->dev, "pcie link training gen2 pass!\n");
>>>> +			err = 0;
>>>> +			break;
>>>> +		}
>>>> +		msleep(20);
>>>> +	}
>>>
>>> Similar problem with your timeout loop, as mentioned above. Although I
>>> confused about what you do in the error case here...
>>>
>>>> +	if (err)
>>>> +		dev_dbg(port->dev, "pcie link training gen2 timeout, force to gen1!\n");
>>>
>>> ... how are you forcing gen1? I don't see any code that indicates this.
>>> Should you at least be checking that there aren't some kind of training
>>> errors, and that we settled back on gen1/2.5G?
>>
>> yes, when failing gen2,  my pcie controller will fallback to gen1
>> automatically.
>
> OK. Well maybe the text should say something like "falling back" instead
> of "force"?
>

Sounds good, will fix. Thanks.

>> if we pass the gen1 then fail to train gen2, a dbg msg here is enough
>> here to let we know that we should check the HW signal. Of course we
>> should make sure that this device supports gen2.
>
> OK.
>
> [...]
>
> Brian
>
>
>


-- 
Best Regards
Shawn Lin