[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <qqskde7ar4dthidew2wrwbu5zzqhna7b27ivsyngzdanztdzwf@nvmxi7kzqkgv>
Date: Fri, 18 Jul 2025 22:49:05 +0530
From: Manivannan Sadhasivam <mani@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Baochen Qiang <baochen.qiang@....qualcomm.com>,
manivannan.sadhasivam@....qualcomm.com, Jeff Johnson <jjohnson@...nel.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>, Krzysztof Wilczyński <kwilczynski@...nel.org>,
Rob Herring <robh@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Nirmal Patel <nirmal.patel@...ux.intel.com>, Jonathan Derrick <jonathan.derrick@...ux.dev>,
linux-wireless@...r.kernel.org, linux-kernel@...r.kernel.org, ath12k@...ts.infradead.org,
ath11k@...ts.infradead.org, ath10k@...ts.infradead.org, ilpo.jarvinen@...ux.intel.com,
linux-arm-msm@...r.kernel.org, linux-pci@...r.kernel.org,
Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>, Qiang Yu <qiang.yu@....qualcomm.com>
Subject: Re: [PATCH 4/6] wifi: ath12k: Use pci_{enable/disable}_link_state()
APIs to enable/disable ASPM states
On Fri, Jul 18, 2025 at 11:26:00AM GMT, Bjorn Helgaas wrote:
> On Fri, Jul 18, 2025 at 05:19:28PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Jul 18, 2025 at 07:05:03PM GMT, Baochen Qiang wrote:
> > > On 7/18/2025 6:20 PM, Manivannan Sadhasivam wrote:
> > > > On Fri, Jul 18, 2025 at 01:27:27PM GMT, Manivannan Sadhasivam wrote:
> > > >> On Fri, Jul 18, 2025 at 10:05:02AM GMT, Baochen Qiang wrote:
> > > >>> On 7/17/2025 7:29 PM, Manivannan Sadhasivam wrote:
> > > >>>> On Thu, Jul 17, 2025 at 06:46:12PM GMT, Baochen Qiang wrote:
> > > >>>>> On 7/17/2025 6:31 PM, Manivannan Sadhasivam wrote:
> > > >>>>>> On Thu, Jul 17, 2025 at 05:24:13PM GMT, Baochen Qiang wrote:
> > > >>>>>>
> > > >>>>>> [...]
> > > >>>>>>
> > > >>>>>>>> @@ -16,6 +16,8 @@
> > > >>>>>>>> #include "mhi.h"
> > > >>>>>>>> #include "debug.h"
> > > >>>>>>>>
> > > >>>>>>>> +#include "../ath.h"
> > > >>>>>>>> +
> > > >>>>>>>> #define ATH12K_PCI_BAR_NUM 0
> > > >>>>>>>> #define ATH12K_PCI_DMA_MASK 36
> > > >>>>>>>>
> > > >>>>>>>> @@ -928,8 +930,7 @@ static void ath12k_pci_aspm_disable(struct ath12k_pci *ab_pci)
> > > >>>>>>>> u16_get_bits(ab_pci->link_ctl, PCI_EXP_LNKCTL_ASPM_L1));
> > > >>>>>>>>
> > > >>>>>>>> /* disable L0s and L1 */
> > > >>>>>>>> - pcie_capability_clear_word(ab_pci->pdev, PCI_EXP_LNKCTL,
> > > >>>>>>>> - PCI_EXP_LNKCTL_ASPMC);
> > > >>>>>>>> + pci_disable_link_state(ab_pci->pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1);
> > > >>>>>>>
> > > >>>>>>> Not always, but sometimes seems the 'disable' does not work:
> > > >>>>>>>
> > > >>>>>>> [ 279.920507] ath12k_pci_power_up 1475: link_ctl 0x43 //before disable
> > > >>>>>>> [ 279.920539] ath12k_pci_power_up 1482: link_ctl 0x43 //after disable
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> set_bit(ATH12K_PCI_ASPM_RESTORE, &ab_pci->flags);
> > > >>>>>>>> }
> > > >>>>>>>> @@ -958,10 +959,7 @@ static void ath12k_pci_aspm_restore(struct ath12k_pci *ab_pci)
> > > >>>>>>>> {
> > > >>>>>>>> if (ab_pci->ab->hw_params->supports_aspm &&
> > > >>>>>>>> test_and_clear_bit(ATH12K_PCI_ASPM_RESTORE, &ab_pci->flags))
> > > >>>>>>>> - pcie_capability_clear_and_set_word(ab_pci->pdev, PCI_EXP_LNKCTL,
> > > >>>>>>>> - PCI_EXP_LNKCTL_ASPMC,
> > > >>>>>>>> - ab_pci->link_ctl &
> > > >>>>>>>> - PCI_EXP_LNKCTL_ASPMC);
> > > >>>>>>>> + pci_enable_link_state(ab_pci->pdev, ath_pci_aspm_state(ab_pci->link_ctl));
> > > >>>>>>>
> > > >>>>>>> always, the 'enable' is not working:
> > > >>>>>>>
> > > >>>>>>> [ 280.561762] ath12k_pci_start 1180: link_ctl 0x43 //before restore
> > > >>>>>>> [ 280.561809] ath12k_pci_start 1185: link_ctl 0x42 //after restore
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Interesting! I applied your diff and I never see this issue so far (across 10+
> > > >>>>>> reboots):
> > > >>>>>
> > > >>>>> I was not testing reboot. Here is what I am doing:
> > > >>>>>
> > > >>>>> step1: rmmod ath12k
> > > >>>>> step2: force LinkCtrl using setpci (make sure it is 0x43, which seems more likely to see
> > > >>>>> the issue)
> > > >>>>>
> > > >>>>> sudo setpci -s 02:00.0 0x80.B=0x43
> > > >>>>>
> > > >>>>> step3: insmod ath12k and check linkctrl
> > > >>>>>
> > > >>>>
> > > >>>> So I did the same and got:
> > > >>>>
> > > >>>> [ 3283.363569] ath12k_pci_power_up 1475: link_ctl 0x43
> > > >>>> [ 3283.363769] ath12k_pci_power_up 1480: link_ctl 0x40
> > > >>>> [ 3284.007661] ath12k_pci_start 1180: link_ctl 0x40
> > > >>>> [ 3284.007826] ath12k_pci_start 1185: link_ctl 0x42
> > > >>>>
> > > >>>> My host machine is Qcom based Thinkpad T14s and it doesn't
> > > >>>> support L0s. So that's why the lnkctl value once enabled
> > > >>>> becomes 0x42. This is exactly the reason why the drivers
> > > >>>> should not muck around LNKCTL register manually.
> > > >>>
> > > >>> Thanks, then the 0x43 -> 0x40 -> 0x40 -> 0x42 sequence should
> > > >>> not be a concern. But still the random 0x43 -> 0x43 -> 0x43 ->
> > > >>> 0x42 sequence seems problematic.
> > > >>>
> > > >>> How many iterations have you done with above steps? From my
> > > >>> side it seems random so better to do some stress test.
> > > >>>
> > > >>
> > > >> So I ran the modprobe for about 50 times on the Intel NUC that
> > > >> has QCA6390, but didn't spot the disparity. This is the script
> > > >> I used:
> > > >>
> > > >> for i in {1..50} ;do echo "Loop $i"; sudo setpci -s 01:00.0 0x80.B=0x43;\
> > > >> sudo modprobe -r ath11k_pci; sleep 1; sudo modprobe ath11k_pci; sleep 1;done
> > > >>
> > > >> And I always got:
> > > >>
> > > >> [ 5862.388083] ath11k_pci_aspm_disable: 609 lnkctrl: 0x43
> > > >> [ 5862.388124] ath11k_pci_aspm_disable: 614 lnkctrl: 0x40
> > > >> [ 5862.876291] ath11k_pci_start: 880 lnkctrl: 0x40
> > > >> [ 5862.876346] ath11k_pci_start: 886 lnkctrl: 0x42
> > > >>
> > > >> Also no AER messages. TBH, I'm not sure how you were able to
> > > >> see the random issues with these APIs. That looks like a race,
> > > >> which is scary.
> > > >>
> > > >> I do not want to ignore your scenario, but would like to
> > > >> reproduce and get to the bottom of it.
> > > >
> > > > I synced with Baochen internally and able to repro the issue.
> > > > Ths issue is due to hand modifying the LNKCTL register from
> > > > userspace. The PCI core maintains the ASPM state internally and
> > > > uses it to change the state when the
> > > > pci_{enable/disable}_link_state*() APIs are called.
> > > >
> > > > So if the userspace or a client driver modifies the LNKCTL
> > > > register manually, it makes the PCI cached ASPM states invalid.
> > > > So while this series fixes the driver from doing that, nothing
> > > > prevents userspace from doing so using 'setpci' and other tools.
> > > > Userspace should only use sysfs attributes to change the state
> > > > and avoid modifying the PCI registers when the PCI core is
> > > > controlling the device. So this is the reason behind the
> > > > errantic behavior of the API and it is not due to the issue with
> > > > the API or the PCI core.
> > >
> > > IMO we can not rely on userspace doing what or not doing what, or
> > > on how it is doing, right? So can we fix PCI core to avoid this?
> >
> > I'm not sure it is possible to *fix* the PCI core here. Since the
> > PCI core gives userspace access to the entire config space of the
> > device, the userspace reads/writes to any of the registers it want.
> > So unless the config space access if forbidden if a driver is bound
> > to the device, it is inevitable. And then there is also /dev/mem...
> >
> > Interestingly, there is an API available for this purpose:
> > pci_request_config_region_exclusive(), but it is used only by the
> > AMD arch driver to prevent userspace from writing to the entire
> > config space of the device.
> >
> > Maybe it makes sense to use something like this to prevent the
> > userspace access to the entire config space if the driver is bind to
> > the device.
>
> I'm not really a fan of pci_request_config_region_exclusive() because
> it's such a singleton thing. I don't like to be one of only a few
> users of an interface.
>
If the API is serving the purpose, I don't see why we cannot be 'one among the
few users'.
> Linux has a long tradition of allowing root users to shoot themselves
> in the foot, and setpci is very useful as a debugging tool. Maybe
> tainting the kernel for config writes from userspace, and possibly
> even a WARN_ONCE() at the time, would be a compromise.
>
I really do not see a need to let the userspace modify the config space when a
driver is bind to it. It fully makes sense when there is no driver attached to
the device. But if there is one, then it just warrants trouble.
If we want to be really cautious with userspace tooling, then we can introduce
a kernel config option similar to CONFIG_IO_STRICT_DEVMEM and keep it disabled
by default.
- Mani
--
மணிவண்ணன் சதாசிவம்
Powered by blists - more mailing lists