lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f1beb97-2a36-dcc0-f09a-59af19663ae2@amazon.com>
Date:   Thu, 29 Sep 2022 10:12:08 +0100
From:   Alexander Graf <graf@...zon.com>
To:     Ajay Kaher <akaher@...are.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>
CC:     "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        Srivatsa Bhat <srivatsab@...are.com>,
        "srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
        Alexey Makhalov <amakhalov@...are.com>,
        Vasavi Sirnapalli <vsirnapalli@...are.com>,
        "er.ajay.kaher@...il.com" <er.ajay.kaher@...il.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "Nadav Amit" <namit@...are.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "jailhouse-dev@...glegroups.com" <jailhouse-dev@...glegroups.com>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "acrn-dev@...ts.projectacrn.org" <acrn-dev@...ts.projectacrn.org>,
        "helgaas@...nel.org" <helgaas@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor


On 29.09.22 07:36, Ajay Kaher wrote:
>> On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@...hat.com> wrote:
>>> Thanks Vitaly for your response.
>>>
>>> 1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field to struct pci_raw_ops
>>> doesn't seems to be appropriate as need to take decision which object of struct pci_raw_ops has
>>> to be used, not something with-in struct pci_raw_ops.
>> I'm not sure I follow, you have two instances of 'struct pci_raw_ops'
>> which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do
>> something like (completely untested):
>>
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index 70533fdcbf02..fb8270fa6c78 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev);
>> extern bool mp_should_keep_irq(struct device *dev);
>>
>> struct pci_raw_ops {
>> +       int rating;
>>           int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                 int reg, int len, u32 *val);
>>           int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn,
>> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
>> index ddb798603201..e9965fd11576 100644
>> --- a/arch/x86/pci/common.c
>> +++ b/arch/x86/pci/common.c
>> @@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
>>   int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                  int reg, int len, u32 *val)
>> {
>> -       if (domain == 0 && reg < 256 && raw_pci_ops)
>> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
>>                  return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
>>          if (raw_pci_ext_ops)
>>                  return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
>> @@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
>>   int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
>>                                                  int reg, int len, u32 val)
>> {
>> -       if (domain == 0 && reg < 256 && raw_pci_ops)
>> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
>> +           (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating))
>>                  return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
>>           if (raw_pci_ext_ops)
>>                  return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
>>
>> and then somewhere in Vmware hypervisor initialization code
>> (arch/x86/kernel/cpu/vmware.c) you do
>>
>>   raw_pci_ext_ops->rating = 100;
> Thanks Vitaly, for your review and helping us to improve the code.
>
> I was working to make changes as you suggested, but before sending v3 would like to
> discuss on following:
>
> If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as const,
> and following change is must in arch/x86/pci/mmconfig_64.c:
>
> -const struct pci_raw_ops pci_mmcfg = {
> +struct pci_raw_ops pci_mmcfg = {
>          .read =         pci_mmcfg_read,
>          .write =        pci_mmcfg_write,
> };
>
> So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops?
>
> And raw_pci_read() will have following change:
>
> -       if (domain == 0 && reg < 256 && raw_pci_ops)
> +       if (domain == 0 && reg < 256 && raw_pci_ops &&
> +            (!prefer_raw_pci_ext_ops ||  !raw_pci_ext_ops)
>
>> why wouldn't it work?
>>
>> (diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops
>> initialization has to be checked so 'rating' is not garbage).
>>
>>> It's a generic solution for all hypervisor (sorry for earlier wrong
>>> Subject), not specific to VMware. Further looking for feedback if it's
>>> impacting to any hypervisor.
>> That's the tricky part. We can check modern hypervisor versions, but
>> what about all other versions in existence? How can we know that there's
>> no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd
>> suggest we limit the change to Vmware hypervisor, other hypervisors may
>> use the same mechanism (like the one above) later (but the person
>> suggesting the patch is always responsible for the research why it is
>> safe to do so).
> Ok, as of now we will make this change specific to VMware hypervisor.


Is there a way we can make it an ACPI property in MCFG to have the 
environment self-describe the fact that it's safe to do ECAM access for 
config space access over legacy PIO? That way we don't need to patch 
guests every time a hypervisor decides that it's safe to prefer ECAM.

Also, Michael (CC'ed) mentioned that according to spec, your PCIe host 
bridge with PCI_COMMAND.MEMORY=0 would stop responding to its ECAM 
window. Given that most ARM systems have no PIO fallback path, we want 
to make sure we never hit that condition.


Alex





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ