lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 18 Jun 2013 19:03:27 -0400
From:	Don Dutile <ddutile@...hat.com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	Bjorn Helgaas <bhelgaas@...gle.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] pci: Enable overrides for missing ACS capabilities

On 06/18/2013 06:22 PM, Alex Williamson wrote:
> On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:
>> On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
>> <alex.williamson@...hat.com>  wrote:
>>> On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:
>>>> On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:
>>>>> PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that
>>>>> allows us to control whether transactions are allowed to be redirected
>>>>> in various subnodes of a PCIe topology.  For instance, if two
>>>>> endpoints are below a root port or downsteam switch port, the
>>>>> downstream port may optionally redirect transactions between the
>>>>> devices, bypassing upstream devices.  The same can happen internally
>>>>> on multifunction devices.  The transaction may never be visible to the
>>>>> upstream devices.
>>>>>
>>>>> One upstream device that we particularly care about is the IOMMU.  If
>>>>> a redirection occurs in the topology below the IOMMU, then the IOMMU
>>>>> cannot provide isolation between devices.  This is why the PCIe spec
>>>>> encourages topologies to include ACS support.  Without it, we have to
>>>>> assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation.
>>>>>
>>>>> Unfortunately, far too many topologies do not support ACS to make this
>>>>> a steadfast requirement.  Even the latest chipsets from Intel are only
>>>>> sporadically supporting ACS.  We have trouble getting interconnect
>>>>> vendors to include the PCIe spec required PCIe capability, let alone
>>>>> suggested features.
>>>>>
>>>>> Therefore, we need to add some flexibility.  The pcie_acs_override=
>>>>> boot option lets users opt-in specific devices or sets of devices to
>>>>> assume ACS support.
>>>>
>>>> "ACS support" means the device provides an ACS capability and we
>>>> can change bits in the ACS Control Register to control how things
>>>> work.
>>>>
>>>> You are really adding a way to "assume this device always routes
>>>> peer-to-peer DMA upstream," which ultimately translates into "assume
>>>> this device can be isolated from others via the IOMMU."  I think.
>>>
>>> We've heard from one vendor that they support ACS with a NULL capability
>>> structure, ie. the ACS PCIe capability exists, but reports no ACS
>>> capabilities and allows no control.  This takes advantage of the "if
>>> supported" style wording of the spec to imply that peer-to-peer is not
>>> supported because the capability is not available.  This supported but
>>> not controllable version is what we're trying to enable here.
>>
>> I was wrong to say "we can change bits in the control register."  All
>> the control register bits are actually required to be hardwired to
>> zero when the relevant functionality is not implemented.
>>
>> The example you mention (a device with an ACS capability structure
>> that reports no supported capabilities and allows no control) sounds
>> perfectly legal as a description of a device that doesn't support
>> peer-to-peer, and I hope it doesn't require any user intervention
>> (e.g., this patch) or quirks to make it work.
>
> It requires:
>
> Subject: [PATCH v2 2/2] pci: Differentiate ACS controllable from enabled
>
>> The ACS P2P Request Redirect "must be implemented by Functions that
>> support peer-to-peer traffic with other Functions."  This example
>> device doesn't support peer-to-peer traffic, so why would it implement
>> the ACS R bit?  In fact, it looks like the R bit (and all the other
>> bits) *must* be hardwired to zero in both capability and control
>> registers.
>
> Right, if they don't support peer-to-peer then hardwiring capability and
> control to zero should indicate that and is fixed by the patch
> referenced above.
>
>>>>> The "downstream" option assumes full ACS support
>>>>> on root ports and downstream switch ports.  The "multifunction"
>>>>> option assumes the subset of ACS features available on multifunction
>>>>> endpoints and upstream switch ports are supported.  The "id:nnnn:nnnn"
>>>>> option enables ACS support on devices matching the provided vendor
>>>>> and device IDs, allowing more strategic ACS overrides.  These options
>>>>> may be combined in any order.  A maximum of 16 id specific overrides
>>>>> are available.  It's suggested to use the most limited set of options
>>>>> necessary to avoid completely disabling ACS across the topology.
>>>>
>>>> Probably I'm confused by your use of "assume full ACS support,"
>>>
>>> [on root ports and downstream ports]
>>>
>>>>   but I
>>>> don't understand the bit about "completely disabling ACS."
>>>
>>> [across the topology]
>>>
>>>>    If we use
>>>> more options than necessary, it seems like we'd assume more isolation
>>>> than really exists.  That sounds like pretending ACS is *enabled* in
>>>> more places than it really is.
>>>
>>> Exactly.  I'm just trying to suggest that booting with
>>> pcie_acs_override=downstream,multifunction is not generally recommended
>>> because it effectively disables ACS checking across the topology and
>>> assumes isolation where there may be none.  In other words, if
>>> everything is overriding ACS checks, then we've disabled any benefit of
>>> doing the checking in the first place (so I really mean disable
>>> checking, not disable ACS).
>>
>> Yep, the missing "checking" is what is confusing.  Also, I think it
>> would be good to make the implication more explicit -- using this
>> option makes the kernel assume isolation where it may not actually
>> exist -- because the users of this option don't know about checking
>> anyway; that's an internal implementation detail.
>
> More explicit in Documentation/kernel-parameters.txt?
>
>>> Instead, the recommendation is to be more
>>> selective, possibly opting for just "downstream" or even better, using
>>> the specific IDs for devices which are known or suspected to not allow
>>> peer-to-peer.
>>>
>>>>> Note to hardware vendors, we have facilities to permanently quirk
>>>>> specific devices which enforce isolation but not provide an ACS
>>>>> capability.  Please contact me to have your devices added and save
>>>>> your customers the hassle of this boot option.
>>>>
>>>> Who do you expect to decide whether to use this option?  I think it
>>>> requires intimate knowledge of how the device works.
>>>>
>>>> I think the benefit of using the option is that it makes assignment of
>>>> devices to guests more flexible, which will make it attractive to users.
>>>> But most users have no way of knowing whether it's actually *safe* to
>>>> use this.  So I worry that you're adding an easy way to pretend isolation
>>>> exists when there's no good way of being confident that it actually does.
>>>>
>>>> I see the warning you added for this case; I just don't feel good about
>>>> it.  Maybe the idea is that, e.g., Red Hat will research certain devices
>>>> and recommend the option for those cases, and sign up to support that
>>>> config.  I assume you would not be willing to support its use unless
>>>> Red Hat specifically recommended it.
>>>
>>> That pretty much sums it up.  We're working with vendors to try to
>>> figure out which devices do not support ACS but don't allow
>>> peer-to-peer, but it's difficult to get decisive statements to that
>>> affect.  Legacy KVM device assignment relied on libvirt to do this ACS
>>> checking and it does a poor job of it, allowing far more devices to be
>>> assigned with ACS enforced than it really should.  It's also trivial to
>>> disable libvirt's enforcement of ACS and people do it every day w/o
>>> really thinking of the implications.  With VFIO-based device assignment
>>> we move to a model where the kernel is enforcing device isolation rather
>>> than it being at the whim of a userspace service.  Now we have not only
>>> a more complete ACS test, but no way to make it more lenient.  Devices
>>> that were previously allowed, no longer are and there's no way around it
>>> without this patch.  However, this patch gives us more control than the
>>> global disable switch in libvirt.  We can still be selective and fine
>>> tune the shape of the groups, while hopefully adhering to the isolation
>>> exposed by the hardware elsewhere.
>>>
>>> I agree that it's difficult to determine whether it's safe to override,
>>> but I don't see a way around it.  If it was obvious how to maintain
>>> isolation, we'd do it automatically.  We need better ACS support from
>>> vendors.  In the mean time, this allows people to complain to their
>>> hardware vendors and opt-in to using the device anyway.  The warning is
>>> there because if I'm debugging odd problems, I want to know that
>>> isolation may be compromised, as the warning indicates.  Thanks,
>>
>> I wonder if we should taint the kernel if this option is used (but not
>> for specific devices added to pci_dev_acs_enabled[]).  It would also
>> be nice if pci_dev_specific_acs_enabled() gave some indication in
>> dmesg for the specific devices you're hoping to add to
>> pci_dev_acs_enabled[].  It's not an enumeration-time quirk right now,
>> so I'm not sure how we'd limit it to one message per device.
>
> Right, setup vs use and getting single prints is a lot of extra code.
> Tainting is troublesome for support, Don had some objections when I
> suggested the same to him.
>
For RH GSS (Global Support Services), a 'taint' in the kernel printk means
RH doesn't support that system.  The 'non-support' due to 'taint' being printed
out in this case may be incorrect -- RH may support that use, at least until
a more sufficient patched kernel is provided.
Thus my dissension that 'taint' be output.  WARN is ok. 'driver beware',
'unleashed dog afoot'.... sure...

>> I assume that if RH knows about certain devices that are safe in this
>> respect, you'd just add them to pci_dev_acs_enabled[] up front, and
>> the command-line option is really just a workaround until you can spin
>> a new kernel that has the table updated?
>
> Yep, needing to know about this option is not user friendly.  We want
> things to "just work" whenever possible.  There are going to be cases
> where there are obscure devices, even mainstream devices, that need
> this.  Whether it's a temporary or long lived solution depends on what
> kind of answers we can get from vendors.
>
>> It might even make sense to simplify this option so it just assumes
>> *all* devices can be isolated, and get rid of the
>> "downstream/multifunction/vendor&  device ID" stuff.  That would be
>> much easier to explain, and it would make it more obvious that we're
>> really doing something pretty scary here.
>
> Seems like that makes it harder to isolate specific devices for
> promotion to pci_dev_acs_enabled[] and more likely that the user will
> turn the whole thing off for a single device and forget about isolation
> of other devices.  It's a time bomb that legacy KVM assignment and
> libvirt make this so easy to work around today, I'd like to be more
> selective for vfio in the future.
>
>> Bjorn (there is one doc comment below)
>>
>>> Alex
>>>
>>>>> Signed-off-by: Alex Williamson<alex.williamson@...hat.com>
>>>>> ---
>>>>>   Documentation/kernel-parameters.txt |   10 +++
>>>>>   drivers/pci/quirks.c                |  102 +++++++++++++++++++++++++++++++++++
>>>>>   2 files changed, 112 insertions(+)
>>>>>
>>>>> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
>>>>> index 47bb23c..a60e6ad 100644
>>>>> --- a/Documentation/kernel-parameters.txt
>>>>> +++ b/Documentation/kernel-parameters.txt
>>>>> @@ -2349,6 +2349,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>>>>>              nomsi   Do not use MSI for native PCIe PME signaling (this makes
>>>>>                      all PCIe root ports use INTx for all services).
>>>>>
>>>>> +   pcie_acs_override =
>>>>> +                   [PCIE] Override missing PCIe ACS support for:
>>>>> +           downstream
>>>>> +                   All downstream ports - full ACS capabilties
>>>>> +           multifunction
>>>>> +                   All multifunction devices - multifunction ACS subset
>>>>> +           id:nnnn:nnnn
>>>>> +                   Specfic device - full ACS capabilities
>>>>> +                   Specified as vid:did (vendor/device ID) in hex
>>
>> This should say something about "device isolation support," I think.
>> It's too big a leap from "missing ACS support" to expect users to make
>> it.
>
> Ok.  Thanks,
>
> Alex
>
>>>>> +
>>>>>      pcmv=           [HW,PCMCIA] BadgePAD 4
>>>>>
>>>>>      pd.             [PARIDE]
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 0369fb6..c7609f6 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -3292,11 +3292,113 @@ struct pci_dev *pci_get_dma_source(struct pci_dev *dev)
>>>>>      return pci_dev_get(dev);
>>>>>   }
>>>>>
>>>>> +static bool acs_on_downstream;
>>>>> +static bool acs_on_multifunction;
>>>>> +
>>>>> +#define NUM_ACS_IDS 16
>>>>> +struct acs_on_id {
>>>>> +   unsigned short vendor;
>>>>> +   unsigned short device;
>>>>> +};
>>>>> +static struct acs_on_id acs_on_ids[NUM_ACS_IDS];
>>>>> +static u8 max_acs_id;
>>>>> +
>>>>> +static __init int pcie_acs_override_setup(char *p)
>>>>> +{
>>>>> +   if (!p)
>>>>> +           return -EINVAL;
>>>>> +
>>>>> +   while (*p) {
>>>>> +           if (!strncmp(p, "downstream", 10))
>>>>> +                   acs_on_downstream = true;
>>>>> +           if (!strncmp(p, "multifunction", 13))
>>>>> +                   acs_on_multifunction = true;
>>>>> +           if (!strncmp(p, "id:", 3)) {
>>>>> +                   char opt[5];
>>>>> +                   int ret;
>>>>> +                   long val;
>>>>> +
>>>>> +                   if (max_acs_id>= NUM_ACS_IDS - 1) {
>>>>> +                           pr_warn("Out of PCIe ACS override slots (%d)\n",
>>>>> +                                   NUM_ACS_IDS);
>>>>> +                           goto next;
>>>>> +                   }
>>>>> +
>>>>> +                   p += 3;
>>>>> +                   snprintf(opt, 5, "%s", p);
>>>>> +                   ret = kstrtol(opt, 16,&val);
>>>>> +                   if (ret) {
>>>>> +                           pr_warn("PCIe ACS ID parse error %d\n", ret);
>>>>> +                           goto next;
>>>>> +                   }
>>>>> +                   acs_on_ids[max_acs_id].vendor = val;
>>>>> +
>>>>> +                   p += strcspn(p, ":");
>>>>> +                   if (*p != ':') {
>>>>> +                           pr_warn("PCIe ACS invalid ID\n");
>>>>> +                           goto next;
>>>>> +                   }
>>>>> +
>>>>> +                   p++;
>>>>> +                   snprintf(opt, 5, "%s", p);
>>>>> +                   ret = kstrtol(opt, 16,&val);
>>>>> +                   if (ret) {
>>>>> +                           pr_warn("PCIe ACS ID parse error %d\n", ret);
>>>>> +                           goto next;
>>>>> +                   }
>>>>> +                   acs_on_ids[max_acs_id].device = val;
>>>>> +                   max_acs_id++;
>>>>> +           }
>>>>> +next:
>>>>> +           p += strcspn(p, ",");
>>>>> +           if (*p == ',')
>>>>> +                   p++;
>>>>> +   }
>>>>> +
>>>>> +   if (acs_on_downstream || acs_on_multifunction || max_acs_id)
>>>>> +           pr_warn("Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA\n");
>>>>> +
>>>>> +   return 0;
>>>>> +}
>>>>> +early_param("pcie_acs_override", pcie_acs_override_setup);
>>>>> +
>>>>> +static int pcie_acs_overrides(struct pci_dev *dev, u16 acs_flags)
>>>>> +{
>>>>> +   int i;
>>>>> +
>>>>> +   /* Never override ACS for legacy devices or devices with ACS caps */
>>>>> +   if (!pci_is_pcie(dev) ||
>>>>> +       pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS))
>>>>> +           return -ENOTTY;
>>>>> +
>>>>> +   for (i = 0; i<  max_acs_id; i++)
>>>>> +           if (acs_on_ids[i].vendor == dev->vendor&&
>>>>> +               acs_on_ids[i].device == dev->device)
>>>>> +                   return 1;
>>>>> +
>>>>> +   switch (pci_pcie_type(dev)) {
>>>>> +   case PCI_EXP_TYPE_DOWNSTREAM:
>>>>> +   case PCI_EXP_TYPE_ROOT_PORT:
>>>>> +           if (acs_on_downstream)
>>>>> +                   return 1;
>>>>> +           break;
>>>>> +   case PCI_EXP_TYPE_ENDPOINT:
>>>>> +   case PCI_EXP_TYPE_UPSTREAM:
>>>>> +   case PCI_EXP_TYPE_LEG_END:
>>>>> +   case PCI_EXP_TYPE_RC_END:
>>>>> +           if (acs_on_multifunction&&  dev->multifunction)
>>>>> +                   return 1;
>>>>> +   }
>>>>> +
>>>>> +   return -ENOTTY;
>>>>> +}
>>>>> +
>>>>>   static const struct pci_dev_acs_enabled {
>>>>>      u16 vendor;
>>>>>      u16 device;
>>>>>      int (*acs_enabled)(struct pci_dev *dev, u16 acs_flags);
>>>>>   } pci_dev_acs_enabled[] = {
>>>>> +   { PCI_ANY_ID, PCI_ANY_ID, pcie_acs_overrides },
>>>>>      { 0 }
>>>>>   };
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ