[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BY3PR21MB3033B9B260CEA13A5C2DE9ECD7139@BY3PR21MB3033.namprd21.prod.outlook.com>
Date: Fri, 18 Mar 2022 20:36:53 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: Robin Murphy <robin.murphy@....com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
"rafael@...nel.org" <rafael@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>,
"lorenzo.pieralisi@....com" <lorenzo.pieralisi@....com>,
"robh@...nel.org" <robh@...nel.org>, "kw@...ux.com" <kw@...ux.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"hch@....de" <hch@....de>,
"m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>
Subject: RE: [PATCH 4/4 RESEND] PCI: hv: Propagate coherence from VMbus device
to PCI device
From: Robin Murphy <robin.murphy@....com> Sent: Friday, March 18, 2022 3:58 AM
>
> On 2022-03-18 05:12, Michael Kelley (LINUX) wrote:
> > From: Robin Murphy <robin.murphy@....com> Sent: Thursday, March 17, 2022
> 10:15 AM
> >>
> >> On 2022-03-17 16:25, Michael Kelley via iommu wrote:
> >>> PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
> >>> device and as a PCI device. The coherence of the VMbus device is
> >>> set based on the VMbus node in ACPI, but the PCI device has no
> >>> ACPI node and defaults to not hardware coherent. This results
> >>> in extra software coherence management overhead on ARM64 when
> >>> devices are hardware coherent.
> >>>
> >>> Fix this by propagating the coherence of the VMbus device to the
> >>> PCI device. There's no effect on x86/x64 where devices are
> >>> always hardware coherent.
> >>>
> >>> Signed-off-by: Michael Kelley <mikelley@...rosoft.com>
> >>> ---
> >>> drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++----
> >>> 1 file changed, 13 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> >>> index ae0bc2f..14276f5 100644
> >>> --- a/drivers/pci/controller/pci-hyperv.c
> >>> +++ b/drivers/pci/controller/pci-hyperv.c
> >>> @@ -49,6 +49,7 @@
> >>> #include <linux/refcount.h>
> >>> #include <linux/irqdomain.h>
> >>> #include <linux/acpi.h>
> >>> +#include <linux/dma-map-ops.h>
> >>> #include <asm/mshyperv.h>
> >>>
> >>> /*
> >>> @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device
> >> *hbus)
> >>> }
> >>>
> >>> /*
> >>> - * Set NUMA node for the devices on the bus
> >>> + * Set NUMA node and DMA coherence for the devices on the bus
> >>> */
> >>> -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus)
> >>> +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus)
> >>> {
> >>> struct pci_dev *dev;
> >>> struct pci_bus *bus = hbus->bridge->bus;
> >>> @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct
> >> hv_pcibus_device *hbus)
> >>> numa_map_to_online_node(
> >>> hv_dev->desc.virtual_numa_node));
> >>>
> >>> + /*
> >>> + * On ARM64, propagate the DMA coherence from the VMbus device
> >>> + * to the corresponding PCI device. On x86/x64, these calls
> >>> + * have no effect because DMA is always hardware coherent.
> >>> + */
> >>> + dev_set_dma_coherent(&dev->dev,
> >>> + dev_is_dma_coherent(&hbus->hdev->device));
> >>
> >> Eww... if you really have to do this, I'd prefer to see a proper
> >> hv_dma_configure() helper implemented and wired up to
> >> pci_dma_configure(). Although since it's a generic property I guess at
> >> worst pci_dma_configure could perhaps propagate coherency from the host
> >> bridge to its children by itself in the absence of any other firmware
> >> info. And it's built-in so could use arch_setup_dma_ops() like everyone
> >> else.
> >>
> >
> > I'm not seeing an existing mechanism to provide a "helper" or override
> > of pci_dma_configure(). Could you elaborate? Or is this something
> > that needs to be created?
>
> I mean something like the diff below (other #includes omitted for
> clarity). Essentially if VMBus has its own way of describing parts of
> the system, then for those parts it's nice if it could fit into the same
> abstractions we use for firmware-based system description.
OK, got it. Adding the VMbus special case into pci_dma_configure()
would work. But after investigating further, let me throw out two
other possible solutions as well.
1) It's easy to make the top-level VMbus node in the DSDT be
the ACPI companion for the pci_host bridge. The function
pcibios_root_bridge_prepare() will do this if the pci-hyperv.c
driver sets sysdata->parent to that top-level VMbus node. Then
pci_dma_configure()works as is. Also, commit 7d40c0f70 that
special cases pcibios_root_bridge_prepare() for Hyper-V becomes
unnecessary. I've coded this approach and it seems to work, but
I don't know all the implications of making that VMbus node be
the ACPI companion, potentially for multiple pci_host bridges.
2) Since there's no vIOMMU and we don't know how it will be
described if there should be one in the future, it's a bit premature
to try to set things up "correctly" now to handle it. A simple
approach is to set dma_default_coherent to true if the top-level
VMbus node in the DSDT indicates coherent. No other changes are
needed. If there's a vIOMMU at some later time, then we add the
use of arch_setup_dma_ops() for each device.
Any thoughts on these two approaches vs. adding the VMbus
special case into pci_dma_configure()? My preference would be
to avoid adding that special case if one of the other two approaches
is reasonable.
Michael
>
> Cheers,
> Robin.
>
> ----->8-----
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 588588cfda48..7d92ccad1569 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -20,6 +20,7 @@
> #include <linux/of_device.h>
> #include <linux/acpi.h>
> #include <linux/dma-map-ops.h>
> +#include <linux/hyperv.h>
> #include "pci.h"
> #include "pcie/portdrv.h"
>
> @@ -1602,6 +1603,8 @@ static int pci_dma_configure(struct device *dev)
> struct acpi_device *adev = to_acpi_device_node(bridge->fwnode);
>
> ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
> + } else if (is_vmbus_dev(bridge)) {
> + ret = hv_dma_configure(dev, device_get_dma_attr(bridge));
> }
>
> pci_put_host_bridge_device(bridge);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index f565a8938836..d1d4dd3d5a3a 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1764,4 +1764,19 @@ static inline unsigned long virt_to_hvpfn(void *addr)
> #define HVPFN_DOWN(x) ((x) >> HV_HYP_PAGE_SHIFT)
> #define page_to_hvpfn(page) (page_to_pfn(page) *
> NR_HV_HYP_PAGES_IN_PAGE)
>
> +static inline bool is_vmbus_dev(struct device *dev)
> +{
> + /*
> + * dev->bus == &hv_bus would break when the caller is built-in
> + * and CONFIG_HYPERV=m, so look for it by name instead.
> + */
> + return !strcmp(dev->bus->name, "vmbus");
> +}
> +
> +static inline int hv_dma_configure(struct device *dev, enum
> dev_dma_attr attr)
> +{
> + arch_setup_dma_ops(dev, 0, 0, NULL, attr == DEV_DMA_COHERENT);
> + return 0;
> +}
> +
> #endif /* _HYPERV_H */
Powered by blists - more mailing lists