[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1246732818.3892.446.camel@macbook.infradead.org>
Date: Sat, 04 Jul 2009 19:40:18 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Chris Wright <chrisw@...s-sol.org>
Cc: Fenghua Yu <fenghua.yu@...el.com>,
'Linus Torvalds' <torvalds@...ux-foundation.org>,
'Stephen Rothwell' <sfr@...b.auug.org.au>,
'Andrew Morton' <akpm@...ux-foundation.org>,
'Ingo Molnar' <mingo@...e.hu>,
'Christopher Wright' <chrisw@...hat.com>,
'Allen Kay' <allen.m.kay@...el.com>,
'iommu' <iommu@...ts.linux-foundation.org>,
'lkml' <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] IOMMU Identity Mapping Support:
iommu_identity_mapping definition
On Thu, 2009-06-18 at 11:13 -0700, Chris Wright wrote:
> * Fenghua Yu (fenghua.yu@...el.com) wrote:
> > IOMMU Identity Mapping Support: iommu_identity_mapping definition
> >
> > Identity mapping for IOMMU defines a single domain to 1:1 map all pci devices
> > to all usable memory.
> >
> > This will reduces map/unmap overhead in DMA API's and improve IOMMU performance.
> > On 10Gb network cards, Netperf shows no performance degradation compared to
> > non-IOMMU performance.
> >
> > This method may lose some of DMA remapping benefits like isolation.
> >
> > The first patch defines iommu_identity_mapping varialbe which controls the
> > identity mapping code and is 0 by default.
>
> The only real difference between "pt" and "identity" is hardware support.
> We should have a single value we don't have to tell users to do different
> things depending on their hardware (they won't even know what they have)
> to achieve the same result.
The _code_ ought to be a lot more shared than it is, too. Currently, the
hardware pass-through support has bugs that the software identity
mapping doesn't. It doesn't remove devices from the identity map if they
are limited to 32-bit DMA and a driver tries to set up mappings, which
is quite suboptimal. And it doesn't put them _back_ into the identity
map after they're detached from a VM, AFAICT.
I was going to fix that and unify the code paths, but then I found a bug
with the software identity mapping too -- if you have a PCI device which
is only capable of 32-bit DMA and it's behind a bridge (such as the
ohci1394 device on a Tylersburg SDV, although you'll have to hack the
kernel to pretend not to have the hardware PT support), it'll cause a
BUG() when it first sets up a mapping. What happens is this:
First it removes that device from si_domain because it can only address
4GiB of RAM, then get_domain_for_dev() will put it right back _in_ the
si_domain again, because it inherits its domain from the upstream PCI
bridge. And then we BUG() in domain_get_iommu() which _really_ doesn't
want to see the si_domain.
I _think_ this is the best fix for that...
>>From 3dfc813d94bba2046c6aed216e0fd69ac93a8e03 Mon Sep 17 00:00:00 2001
From: David Woodhouse <David.Woodhouse@...el.com>
Date: Sat, 4 Jul 2009 19:11:08 +0100
Subject: [PATCH] intel-iommu: Don't use identity mapping for PCI devices behind bridges
Our current strategy for pass-through mode is to put all devices into
the 1:1 domain at startup (which is before we know what their dma_mask
will be), and only _later_ take them out of that domain, if it turns out
that they really can't address all of memory.
However, when there are a bunch of PCI devices behind a bridge, they all
end up with the same source-id on their DMA transactions, and hence in
the same IOMMU domain. This means that we _can't_ easily move them from
the 1:1 domain into their own domain at runtime, because there might be DMA
in-flight from their siblings.
So we have to adjust our pass-through strategy: For PCI devices not on
the root bus, and for the bridges which will take responsibility for
their transactions, we have to start up _out_ of the 1:1 domain, just in
case.
This fixes the BUG() we see when we have 32-bit-capable devices behind a
PCI-PCI bridge, and use the software identity mapping.
It does mean that we might end up using 'normal' mapping mode for some
devices which could actually live with the faster 1:1 mapping -- but
this is only for PCI devices behind bridges, which presumably aren't the
devices for which people are most concerned about performance.
Signed-off-by: David Woodhouse <David.Woodhouse@...el.com>
---
drivers/pci/intel-iommu.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)
diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f9fc4f3..360fb67 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2122,6 +2122,36 @@ static int iommu_should_identity_map(struct pci_dev *pdev, int startup)
if (iommu_identity_mapping == 2)
return IS_GFX_DEVICE(pdev);
+ /*
+ * We want to start off with all devices in the 1:1 domain, and
+ * take them out later if we find they can't access all of memory.
+ *
+ * However, we can't do this for PCI devices behind bridges,
+ * because all PCI devices behind the same bridge will end up
+ * with the same source-id on their transactions.
+ *
+ * Practically speaking, we can't change things around for these
+ * devices at run-time, because we can't be sure there'll be no
+ * DMA transactions in flight for any of their siblings.
+ *
+ * So PCI devices (unless they're on the root bus) as well as
+ * their parent PCI-PCI or PCIe-PCI bridges must be left _out_ of
+ * the 1:1 domain, just in _case_ one of their siblings turns out
+ * not to be able to map all of memory.
+ */
+ if (!pdev->is_pcie) {
+ if (!pci_is_root_bus(pdev->bus))
+ return 0;
+ if (pdev->class >> 8 == PCI_CLASS_BRIDGE_PCI)
+ return 0;
+ } else if (pdev->pcie_type == PCI_EXP_TYPE_PCI_BRIDGE)
+ return 0;
+
+ /*
+ * At boot time, we don't yet know if devices will be 64-bit capable.
+ * Assume that they will -- if they turn out not to be, then we can
+ * take them out of the 1:1 domain later.
+ */
if (!startup)
return pdev->dma_mask > DMA_BIT_MASK(32);
--
1.6.2.5
--
David Woodhouse Open Source Technology Centre
David.Woodhouse@...el.com Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists