[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200710193003.2lt3i5ocy5kk3b3p@pali>
Date: Fri, 10 Jul 2020 21:30:03 +0200
From: Pali Rohár <pali@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
Andrew Murray <amurray@...goodpenguin.co.uk>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Marek Behún <marek.behun@....cz>,
Remi Pommarel <repk@...plefau.lt>,
Tomasz Maciej Nowak <tmn505@...il.com>,
Xogium <contact@...ium.me>, linux-pci@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] PCI: aardvark: Don't touch PCIe registers if no card
connected
On Friday 10 July 2020 11:08:28 Bjorn Helgaas wrote:
> On Fri, Jul 10, 2020 at 05:44:58PM +0200, Pali Rohár wrote:
> > I can reproduce following issue: Connect Compex WLE900VX card, configure
> > aardvark to gen2 mode. And then card is detected only after the first
> > link training. If kernel tries to retrain link again (e.g. via ASPM
> > code) then card is not detected anymore.
>
> Somebody should go over the ASPM retrain link code and the PCIe spec
> with a fine-toothed comb. Maybe we're doing something wrong there.
I think this is not ASPM related as card simply disappear just after
flipping PCI_EXP_LNKCTL_RL bit second time without changing ASPM bits.
> Or maybe aardvark has some hardware issue and we need some sort of
> quirk to work around it.
It is possible that this is aardvark issue. As I said I really do not
know.
In aardvark driver there is already merged workaround for this issue:
driver force gen1 aardvark mode for gen1 card.
> > Another issue which happens for WLE900VX, WLE600VX and WLE1216VS-20 (but
> > not for WLE200VX): Linux kernel can detect these cards only if it issues
> > card reset via PERST# signal and start link training (via standard pcie
> > endpoint register PCI_EXP_LNKCTL/PCI_EXP_LNKCTL_RL)
>
> I think you mean "downstream port" (not "endpoint") register?
Yes.
> PCI_EXP_LNKCTL_RL is only applicable to *downstream ports* (root ports
> or switch downstream ports) and is reserved for endpoints.
>
> > immediately after
> > enable link training in aardvark (via aardvark specific LINK_TRAINING_EN
> > bit). If there is e.g. 100ms delay between enabling link training and
> > setting PCI_EXP_LNKCTL_RL bit then these cards are not detected.
>
> This sounds problematic. Hardware should not be dependent on the
> software being "fast enough". In general we should be able to insert
> arbitrary delays at any point without breaking anything.
Yes, it is problematic. For example following commit broke those cards:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4c7d053d7f77cd5c1a1ba7c7ce085ddba13d1d7
And this commit fixed it (just msleep was moved to different stage):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6964494582f56a3882c2c53b0edbfe99eb32b2e1
But we somehow need to deal with it until we find root cause.
Basically additional sleep in aardvark init phase can break WLE900VX
cards, but not WLE200VX.
And because WLE900VX works fine with pci-mvebu and WLE200VX works fine
with pci-aardvark we cannot deduce from it if problem for combination of
WLE900VX and aardvark is in WLE900VX or in aardvark.
> But I have the impression that aardvark requires more software
> hand-holding that most hardware does. If it imposes timing
> requirements on the software, that *should* be documented in the
> aardvark spec.
There is absolutely nothing regarding to timings in documentation which
I saw. In documentation are just instructions/steps how to init PCI
subsystem and it is basically advk_pcie_setup_hw() function.
> > I read in kernel bugzilla that WLE600VX and WLE900VX cards are buggy and
> > more people have problems with them. But issues described in kernel
> > bugzilla (like card is reporting incorrect PCI device id) I'm not
> > observing.
>
> Pointer?
Hm... I cannot find right now pointer to bugzilla, but I have pointer to
ath9k-devel mailing list with that incorrect device id:
https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> Is the incorrect device ID 0xffff?
No, incorrect device ID in that case is 0xabcd and vendor ID is correct
(Qualcomm).
> That could be a symptom
> of a PCIe error. If we read a device ID that's something other than
> 0, 0xffff, or the correct ID, that would be really weird. Even 0
> would be really strange.
It is strange and also reason why discussion on that list is long.
As I said, I'm not seeing that problem with wrong device ID.
But I know people who are observing same problem on different boards
(which do not use aardvark) as described in above mailing list thread
with Compex ath10k cards.
> I suspect these wifi cards are a little special because they probably
> play unusual games with power for airplane mode and the like.
This is another/different problem and is already "documented" in kernel
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=84821#c52
Powered by blists - more mailing lists