[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR12MB1600255ACFCD3FB3C80EB8B6C88B0@MWHPR12MB1600.namprd12.prod.outlook.com>
Date: Wed, 9 Aug 2017 12:33:34 +0000
From: Casey Leedom <leedom@...lsio.com>
To: Bjorn Helgaas <helgaas@...nel.org>,
Ding Tianhong <dingtianhong@...wei.com>
CC: "ashok.raj@...el.com" <ashok.raj@...el.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
Michael Werner <werner@...lsio.com>,
Ganesh GR <ganeshgr@...lsio.com>,
"asit.k.mallick@...el.com" <asit.k.mallick@...el.com>,
"patrick.j.cramer@...el.com" <patrick.j.cramer@...el.com>,
"Suravee.Suthikulpanit@....com" <Suravee.Suthikulpanit@....com>,
"Bob.Shaw@....com" <Bob.Shaw@....com>,
"l.stach@...gutronix.de" <l.stach@...gutronix.de>,
"amira@...lanox.com" <amira@...lanox.com>,
"gabriele.paoloni@...wei.com" <gabriele.paoloni@...wei.com>,
"David.Laight@...lab.com" <David.Laight@...lab.com>,
"jeffrey.t.kirsher@...el.com" <jeffrey.t.kirsher@...el.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"will.deacon@....com" <will.deacon@....com>,
"mark.rutland@....com" <mark.rutland@....com>,
"robin.murphy@....com" <robin.murphy@....com>,
"davem@...emloft.net" <davem@...emloft.net>,
"alexander.duyck@...il.com" <alexander.duyck@...il.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxarm@...wei.com" <linuxarm@...wei.com>
Subject: Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
| From: Bjorn Helgaas <helgaas@...nel.org>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
|
| if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
| adapter->flags |= ROOT_NO_RELAXED_ORDERING;
|
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...
Which is back to something very close to what I initially suggested in my
first patch submission. Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.
Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination. (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)
So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable]. This was to serve
two purposes:
1. Turn that off in order to prevent sending any Transaction Layer
Packets with the Relaxed Ordering Attribute to any Completer.
Which unfortunately would also prevent Peer-to-Peer use of the
Relaxed Ordering Attribute.
2. Act as a message to Device Drivers for those downstream devices
that the they were dealing with a broken Root Port implementation.
And this would work even for a driver in a VM with an attached
device since it would be able to see the PCIe Configuration Space
for the attached device.
I haven't been excited about any of this because:
A. While so far all of the examples we've talked about are broken
Root Port Completers, it's perfectly possible that other devices
could be broken -- say an NVMe Device which is not "Coherent
Memory". How would this fit into the framework APIs being
described?
B. I have yet to see a n example of how the currently proposed
API framework would be used in a hybrid environment where
TLPs to the Root Port would not use Relaxed Ordering, but TLPs
to a Peer would use Relaxed Ordering. So far its all been about
using a "big hammer" to completely disable the use of Relaxed
Ordering.
But the VM problem keeps cropping up over and over. A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").
Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.
| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
|
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA. I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...
Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above. And that the Intel document mentions itself.
Casey
Powered by blists - more mailing lists