lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 16 May 2012 09:29:51 -0400
From:	Don Dutile <ddutile@...hat.com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	Bjorn Helgaas <bhelgaas@...gle.com>, kvm@...r.kernel.org,
	B07421@...escale.com, aik@...abs.ru, benh@...nel.crashing.org,
	linux-pci@...r.kernel.org, agraf@...e.de, qemu-devel@...gnu.org,
	chrisw@...s-sol.org, B08248@...escale.com,
	iommu@...ts.linux-foundation.org, gregkh@...uxfoundation.org,
	avi@...hat.com, benve@...co.com, dwmw2@...radead.org,
	linux-kernel@...r.kernel.org, david@...son.dropbear.id.au
Subject: Re: [PATCH 05/13] pci: New pci_acs_enabled()

On 05/15/2012 05:09 PM, Alex Williamson wrote:
> On Tue, 2012-05-15 at 13:56 -0600, Bjorn Helgaas wrote:
>> On Mon, May 14, 2012 at 4:49 PM, Alex Williamson
>> <alex.williamson@...hat.com>  wrote:
>>> On Mon, 2012-05-14 at 16:02 -0600, Bjorn Helgaas wrote:
>>>> On Fri, May 11, 2012 at 4:56 PM, Alex Williamson
>>>> <alex.williamson@...hat.com>  wrote:
>>>>> In a PCIe environment, transactions aren't always required to
>>>>> reach the root bus before being re-routed.  Peer-to-peer DMA
>>>>> may actually not be seen by the IOMMU in these cases.  For
>>>>> IOMMU groups, we want to provide IOMMU drivers a way to detect
>>>>> these restrictions.  Provided with a PCI device, pci_acs_enabled
>>>>> returns the furthest downstream device with a complete PCI ACS
>>>>> chain.  This information can then be used in grouping to create
>>>>> fully isolated groups.  ACS chain logic extracted from libvirt.
>>>>
>>>> The name "pci_acs_enabled()" sounds like it returns a boolean, but it doesn't.
>>>
>>> Right, maybe this should be:
>>>
>>> struct pci_dev *pci_find_upstream_acs(struct pci_dev *pdev);
>>>
+1; there is a global in the PCI code, pci_acs_enable,
and a function pci_enable_acs(), which the above name certainly
confuses.  I recommend  pci_find_top_acs_bridge()
would be most descriptive.

>>>> I'm not sure what "a complete PCI ACS chain" means.
>>>>
>>>> The function starts from "dev" and searches *upstream*, so I'm
>>>> guessing it returns the root of a subtree that must be contained in a
>>>> group.
>>>
>>> Any intermediate switch between an endpoint and the root bus can
>>> redirect a dma access without iommu translation,
>>
>> Is this "redirection" just the normal PCI bridge forwarding that
>> allows peer-to-peer transactions, i.e., the rule (from P2P bridge
>> spec, rev 1.2, sec 4.1) that the bridge apertures define address
>> ranges that are forwarded from primary to secondary interface, and the
>> inverse ranges are forwarded from secondary to primary?  For example,
>> here:
>>
>>                     ^
>>                     |
>>            +--------+-------+
>>            |                |
>>     +------+-----+    +-----++-----+
>>     | Downstream |    | Downstream |
>>     |    Port    |    |    Port    |
>>     |   06:05.0  |    |   06:06.0  |
>>     +------+-----+    +------+-----+
>>            |                 |
>>       +----v----+       +----v----+
>>       | Endpoint|       | Endpoint|
>>       | 07:00.0 |       | 08:00.0 |
>>       +---------+       +---------+
>>
>> that rule is all that's needed for a transaction from 07:00.0 to be
>> forwarded from upstream to the internal switch bus 06, then claimed by
>> 06:06.0 and forwarded downstream to 08:00.0.  This is plain old PCI,
>> nothing specific to PCIe.
>
> Right, I think the main PCI difference is the point-to-point nature of
> PCIe vs legacy PCI bus.  On a legacy PCI bus there's no way to prevent
> devices talking to each other, but on PCIe the transaction makes a
> U-turn at some point and heads out another downstream port.  ACS allows
> us to prevent that from happening.
>
detail: PCIe up/downstream routing is really done by an internal switch;
         ACS forces the legacy, PCI base-limit address routing and *forces*
         the switch to always route the transaction from a downstream port
         to the upstream port.

>> I don't understand ACS very well, but it looks like it basically
>> provides ways to prevent that peer-to-peer forwarding, so transactions
>> would be sent upstream toward the root (and specifically, the IOMMU)
>> instead of being directly claimed by 06:06.0.
>
> Yep, that's my meager understanding as well.
>
+1

>>> so we're looking for
>>> the furthest upstream device for which acs is enabled all the way up to
>>> the root bus.
>>
>> Correct me if this is wrong: To force device A's DMAs to be processed
>> by an IOMMU, ACS must be enabled on the root port and every downstream
>> port along the path to A.
>
> Yes, modulo this comment in libvirt source:
>
>      /* if we have no parent, and this is the root bus, ACS doesn't come
>       * into play since devices on the root bus can't P2P without going
>       * through the root IOMMU.
>       */
>
Correct. PCIe spec says roots must support ACS. I believe all the
root bridges that have an IOMMU have ACS wired in/on.

> So we assume that a redirect at the point of the iommu will factor in
> iommu translation.
>
>> If so, I think you're trying to find out the closest upstream device X
>> such that everything leading to X has ACS enabled.  Every device below
>> X can DMA freely to other devices below X, so they would all have to
>> be in the same isolated group.
>
> Yes
>
>> I tried to work through some examples to develop some intuition about this:
>
> (inserting fixed url)
>> http://www.asciiflow.com/#3736558963405980039
>
>> pci_acs_enabled(00:00.0) = 00:00.0 (on root bus (but doesn't it matter
>> if 00:00.0 is PCIe or if RP has ACS?))
>
> Hmm, the latter is the assumption above.  For the former, I think
> libvirt was probably assuming that PCI devices must have a PCIe device
> upstream from them because x86 doesn't have assignment friendly IOMMUs
> except on PCIe.  I'll need to work on making that more generic.
>
>> pci_acs_enabled(00:01.0) = 00:01.0 (on root bus)
>> pci_acs_enabled(01:00.0) = 01:00.0 (acs_dev = 00:01.0, 01:00.0 is not
>> PCIe; seems wrong)
>
> Oops, I'm calling pci_find_upstream_pcie_bridge() first on any of my
> input devices, so this was passing for me.  I'll need to incorporate
> that generically.
>
>> pci_acs_enabled(00:02.0) = 00:02.0 (on root bus; seems wrong if RP
>> doesn't have ACS)
>
> Yeah, let me validate the libvirt assumption.  I see ACS on my root
> port, so maybe they're just assuming it's always enabled or that the
> precedence favors IOMMU translation.  I'm also starting to think that we
> might want "from" and "to" struct pci_dev parameters to make it more
> flexible where the iommu lives in the system.
>
see comment above wrt root ports that have IOMMUs in them.

>> pci_acs_enabled(02:00.0) = 00:02.0 (acs_dev = 00:02.0, 02:00.0 has no ACS cap)
>> pci_acs_enabled(03:00.0) = 00:02.0 (acs_dev = 00:02.0)
>> pci_acs_enabled(02:01.0) = 02:01.0 (acs_dev = 00:02.0, 02:01.0 has ACS enabled)
>> pci_acs_enabled(04:00.0) = 04:00.0 (acs_dev = 02:01.0, 04:00.0 is not
>> a bridge; seems wrong if 04:00 is a multi-function device)
>
> AIUI, ACS is not an endpoint property, so this is what should happen.  I
> don't think multifunction plays a role other than how much do we trust
> the implementation to not allow back channels between functions (the
> answer should probably be not at all).
>
correct. ACS is a *bridge* property.
The unknown wrt multifunction devices is that such devices *could* be implemented
by a hidden (not responding to PCI cfg accesses from downstream port) PCI bridge
btwn the functions within a device.
Such a bridge could allow peer-to-peer xactions and there is no way for OS's to
force ACS.  So, one has to ask the hw vendors if such a hidden device exists
in the implementation, and whether peer-to-peer is enabled/allowed -- a hidden PCI
bridge/PCIe-switch could just be hardwired to push all IO to upstream port,
and allow parent bridge re-route it back down if peer-to-peer is desired.
Debate exists whether multifunction devices are 'secure' b/c of this unknown.
Maybe a PCIe (min., SRIOV) spec change is needed in this area to
determine this status about a device (via pci cfg/cap space).

>> pci_acs_enabled(02:02.0) = 02:02.0 (acs_dev = 00:02.0, 02:02.0 has ACS enabled)
>> pci_acs_enabled(05:00.0) = 05:00.0 (acs_dev = 02:02.0, 05:00.0 is not a bridge)
>>
>> But it didn't really help.  I still can't develop a mental picture of
>> what this function does.
>
> It helped me :)  These are good examples, I'll work on fixing it for
> them.  Thanks,
>
> Alex
>
>
> _______________________________________________
> iommu mailing list
> iommu@...ts.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ