lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iWAaj5_hBC_1pZcA-cQ0Yz6hvQjbsv3Gmv6jN_utt4OQ@mail.gmail.com>
Date: Thu, 10 Jul 2025 21:53:18 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: "David E. Box" <david.e.box@...ux.intel.com>
Cc: rafael@...nel.org, bhelgaas@...gle.com, andrea.righi@...onical.com, 
	vicamo.yang@...onical.com, kenny@...ix.com, nirmal.patel@...ux.intel.com, 
	linux-pm@...r.kernel.org, linux-pci@...r.kernel.org, 
	ilpo.jarvinen@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/ASPM: Allow ASPM enablement for devices behind Intel VMD

On Thu, Jul 10, 2025 at 3:16 AM David E. Box
<david.e.box@...ux.intel.com> wrote:
>
> Devices behind Intel's Volume Management Device (VMD) controller reside on
> a synthetic PCI hierarchy that is intentionally hidden from ACPI and
> firmware. As such, BIOS does not configure ASPM for these devices, and the
> responsibility for link power management, including ASPM and LTR, falls
> entirely to the VMD driver.
>
> However, the current PCIe ASPM code prevents ASPM configuration when the
> ACPI_FADT_NO_ASPM flag is set, disallowing OS management. This leaves ASPM
> permanently disabled for these devices, contrary to the platform's design
> intent.
>
> Introduce a callback mechanism that allows the VMD driver to enable ASPM
> for its domain, bypassing the global ACPI_FADT_NO_ASPM restriction that is
> not applicable in this context. This ensures that devices behind VMD can
> benefit from ASPM savings as originally intended.
>
> Link: https://lore.kernel.org/linux-pm/0b166ece-eeec-ba5d-2212-50d995611cef@panix.com
> Signed-off-by: David E. Box <david.e.box@...ux.intel.com>

First of all, thanks for doing this work, much appreciated!

> ---
>  drivers/pci/controller/vmd.c | 28 ++++++++++++++++++++++++++--
>  drivers/pci/pci.h            |  8 ++++++++
>  drivers/pci/pcie/aspm.c      | 36 +++++++++++++++++++++++++++++++++++-
>  3 files changed, 69 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 8df064b62a2f..e685586dc18b 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -21,6 +21,8 @@
>
>  #include <asm/irqdomain.h>
>
> +#include "../pci.h"
> +
>  #define VMD_CFGBAR     0
>  #define VMD_MEMBAR1    2
>  #define VMD_MEMBAR2    4
> @@ -730,7 +732,7 @@ static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  }
>
>  /*
> - * Enable ASPM and LTR settings on devices that aren't configured by BIOS.
> + * Enable LTR settings on devices that aren't configured by BIOS.
>   */
>  static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
>  {
> @@ -770,10 +772,27 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
>          * PCIe r6.0, sec 5.5.4.
>          */
>         pci_set_power_state_locked(pdev, PCI_D0);
> -       pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);

Do I think correctly that this doesn't work because of the
aspm_disabled check in __pci_enable_link_state()?

>         return 0;
>  }
>
> +static long vmd_get_link_state(struct pci_dev *pdev, void *data)
> +{
> +       struct pci_bus *vmd_bus = data;
> +       struct pci_bus *bus = pdev->bus;
> +
> +       while (bus) {
> +               if (bus == vmd_bus)
> +                       return PCIE_LINK_STATE_ALL;
> +
> +               if (!bus->self)
> +                       break;
> +
> +               bus = bus->self->bus;
> +       }

If I'm not mistaken, it would be sufficient to do a check like

    if (pci_dev->bus->ops == &vmd_ops)
            return PCIE_LINK_STATE_ALL;

instead of the above, or if not then why not?

> +
> +       return -ENODEV;
> +}
> +
>  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  {
>         struct pci_sysdata *sd = &vmd->sysdata;
> @@ -785,6 +804,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>         resource_size_t membar2_offset = 0x2000;
>         struct pci_bus *child;
>         struct pci_dev *dev;
> +       struct pcie_aspm_vmd_link_state vmd_link_state;
>         int ret;
>
>         /*
> @@ -911,6 +931,10 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>                 return -ENODEV;
>         }
>
> +       vmd_link_state.cb = vmd_get_link_state;
> +       vmd_link_state.data = vmd->bus;
> +       pci_register_vmd_link_state_cb(&vmd_link_state);
> +
>         vmd_copy_host_bridge_flags(pci_find_host_bridge(vmd->dev->bus),
>                                    to_pci_host_bridge(vmd->bus->bridge));
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 12215ee72afb..dcf7d39c660f 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -821,6 +821,12 @@ void pci_configure_aspm_l1ss(struct pci_dev *dev);
>  void pci_save_aspm_l1ss_state(struct pci_dev *dev);
>  void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>
> +
> +struct pcie_aspm_vmd_link_state {
> +       long (*cb)(struct pci_dev *pdev, void *data);
> +       void *data;
> +};
> +
>  #ifdef CONFIG_PCIEASPM
>  void pcie_aspm_init_link_state(struct pci_dev *pdev);
>  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> @@ -828,6 +834,7 @@ void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked);
>  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
>  void pci_configure_ltr(struct pci_dev *pdev);
>  void pci_bridge_reconfigure_ltr(struct pci_dev *pdev);
> +void pci_register_vmd_link_state_cb(struct pcie_aspm_vmd_link_state *state);
>  #else
>  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> @@ -835,6 +842,7 @@ static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked)
>  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
>  static inline void pci_configure_ltr(struct pci_dev *pdev) { }
>  static inline void pci_bridge_reconfigure_ltr(struct pci_dev *pdev) { }
> +void pci_register_vmd_link_state_cb(struct pcie_aspm_vmd_link_state *state) { }
>  #endif
>
>  #ifdef CONFIG_PCIE_ECRC
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 29fcb0689a91..c609d3c309be 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -320,6 +320,27 @@ static int policy_to_clkpm_state(struct pcie_link_state *link)
>         return 0;
>  }
>
> +static struct pcie_aspm_vmd_link_state vmd_state;
> +
> +void pci_register_vmd_link_state_cb(struct pcie_aspm_vmd_link_state *state)
> +{
> +       mutex_lock(&aspm_lock);
> +       vmd_state.cb = state->cb;
> +       vmd_state.data = state->data;
> +       mutex_unlock(&aspm_lock);
> +}
> +EXPORT_SYMBOL_GPL(pci_register_vmd_link_state_cb);
> +
> +static long pci_get_vmd_link_state(struct pci_dev *pdev)
> +{
> +       int state = -ENODEV;
> +
> +       if (vmd_state.cb)
> +               state = vmd_state.cb(pdev, vmd_state.data);
> +
> +       return state;
> +}
> +
>  static void pci_update_aspm_saved_state(struct pci_dev *dev)
>  {
>         struct pci_cap_saved_state *save_state;
> @@ -794,6 +815,7 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>         u32 parent_lnkcap, child_lnkcap;
>         u16 parent_lnkctl, child_lnkctl;
>         struct pci_bus *linkbus = parent->subordinate;
> +       int vmd_aspm_default;
>
>         if (blacklist) {
>                 /* Set enabled/disable so that we will disable ASPM later */
> @@ -865,8 +887,20 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>                 pcie_capability_write_word(child, PCI_EXP_LNKCTL, child_lnkctl);
>         }
>
> +       /*
> +        * Devices behind Intel VMD operate on a synthetic PCI bus that BIOS
> +        * and ACPI do not enumerate or configure. ASPM for these devices must
> +        * be managed by the VMD driver itself, independent of global ACPI ASPM
> +        * constraints. This callback mechanism allows selective ASPM
> +        * enablement for such domains.
> +        */
> +       vmd_aspm_default = pci_get_vmd_link_state(parent);
> +
>         /* Save default state */
> -       link->aspm_default = link->aspm_enabled;
> +       if (vmd_aspm_default < 0)
> +               link->aspm_default = link->aspm_enabled;
> +       else
> +               link->aspm_default = vmd_aspm_default;

Well, this is not nice.

First off, it adds VMD-specific stuff to otherwise generic ASPM code.
Second, it doesn't actually do anything about the aspm_disabled checks
all over the place, so they will still trigger even though ASPM will
be effectively enabled for devices on the VMD bus.

>
>         /* Setup initial capable state. Will be updated later */
>         link->aspm_capable = link->aspm_support;
>
> base-commit: d0b3b7b22dfa1f4b515fd3a295b3fd958f9e81af
> --

It appears to me that the underlying problem is that aspm_disabled is
global and it is set during PCI root bus creation in
acpi_pci_root_add().  Thus it affects all of the PCI buses even though
the BIOS says that it wants to control ASPM for this particular PCIe
hierarchy.  If there were another PCI root enumerated by ACPI where
the OS would be allowed to control ASPM, it would not work just like
the VMD case.

To me, this suggests an approach based on moving the "ASPM disabled
because the BIOS wants to control it" setting to pci_bus_flags_t and
setting it on a per-hierarchy basis.  Since the VMD bus is a separate
PCIe hierarchy from the kernel perspective, it will not have this flag
set and the OS should be able to configure ASPM for devices on that
bus.

Do I think correctly or am I overlooking something here?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ