[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CSIKESNNLX5D.4VDA3E6NBN3N@vincent-arch>
Date: Wed, 10 May 2023 13:35:43 +0200
From: "Vincenzo Palazzo" <vincenzopalazzodev@...il.com>
To: "Bjorn Helgaas" <helgaas@...nel.org>
Cc: <linux-pci@...r.kernel.org>, <robh@...nel.org>, <heiko@...ech.de>,
<kw@...ux.com>, <shawn.lin@...k-chips.com>,
<linux-kernel@...r.kernel.org>, <lgirdwood@...il.com>,
<linux-rockchip@...ts.infradead.org>, <broonie@...nel.org>,
<bhelgaas@...gle.com>,
<linux-kernel-mentees@...ts.linuxfoundation.org>,
<lpieralisi@...nel.org>, <linux-arm-kernel@...ts.infradead.org>,
"Dan Johansen" <strit@...jaro.org>
Subject: Re: [PATCH v1] drivers: pci: introduce configurable delay for
Rockchip PCIe bus scan
> Hi Vincenzo,
Hi :)
> Thanks for raising this issue. Let's see what we can do to address
> it.
Yeah, as I said in my cover letter, I am not happy with my solution,
but we should start somewhere to discuss it.
> > Add a configurable delay to the Rockchip PCIe driver to address
> > crashes that occur on some old devices, such as the Pine64 RockPro64.
> >
> > This issue is affecting the ARM community, but there is no
> > upstream solution for it yet.
>
> It sounds like this happens with several endpoints, right? And I
> assume the endpoints work fine in other non-Rockchip systems? If
> that's the case, my guess is the problem is with the Rockchip host
> controller and how it's initialized, not with the endpoints.
Yeah, the crash is only reproducible with the Rockchip system, or better,
the crash is reproducible only in some modern devices that use the old
Rockchip driver mentioned in this patch.
> The only delays and timeouts I see in the driver now are in
> rockchip_pcie_host_init_port(), where it waits for link training to
> complete. I assume the link training did completely successfully
> since you don't mention either a gen1 or gen2 timeout (although the
> gen2 message is a dev_dbg() that normally wouldn't go to the console).
>
> I don't know that the spec contains a retrain timeout value. Several
> other drivers use 1 second, while rockchip uses 500ms (for example,
> see LINK_RETRAIN_TIMEOUT and LINK_UP_TIMEOUT).
>
> I think we need to understand the issue better before adding a DT
> property and a module parameter. Those are hard for users to deal
> with. If we can figure out a value that works for everybody, it would
> be better to just hard-code it in the driver and use that all the
> time.
Yeah, I see, I see. This makes sense. Is there any path that I can follow in
order to better understand what's going on at the hardware level? In other
words, how can I help to understand this issue better and provide a
unique solution for everybody?
Thanks for the nits in the patch, I will take a look with a fresh mind
later in the day.
Cheers!
Vincent.
Powered by blists - more mailing lists