lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 10 Oct 2022 13:05:24 -0400
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Nadav Amit <namit@...are.com>
Cc:     Alexander Graf <graf@...zon.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Ajay Kaher <akaher@...are.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        Srivatsa Bhat <srivatsab@...are.com>,
        "srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
        Alexey Makhalov <amakhalov@...are.com>,
        Vasavi Sirnapalli <vsirnapalli@...are.com>,
        "er.ajay.kaher@...il.com" <er.ajay.kaher@...il.com>,
        "willy@...radead.org" <willy@...radead.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "jailhouse-dev@...glegroups.com" <jailhouse-dev@...glegroups.com>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "helgaas@...nel.org" <helgaas@...nel.org>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor

On Tue, Oct 04, 2022 at 06:48:11PM +0000, Nadav Amit wrote:
> On Oct 4, 2022, at 1:22 AM, Alexander Graf <graf@...zon.com> wrote:
> 
> > ⚠ External Email
> > 
> > Hey Nadav,
> > 
> > On 03.10.22 19:34, Nadav Amit wrote:
> >> On Oct 3, 2022, at 8:03 AM, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> >> 
> >>> Not my but rather PCI maintainer's call but IMHO dropping 'const' is
> >>> better, introducing a new global var is our 'last resort' and should be
> >>> avoided whenever possible. Alternatively, you can add a
> >>> raw_pci_ext_ops_preferred() function checking somethin within 'struct
> >>> hypervisor_x86' but I'm unsure if it's better.
> >>> 
> >>> Also, please check Alex' question/suggestion.
> >> Here is my take (and Ajay knows probably more than me):
> >> 
> >> Looking briefly on MCFG, I do not see a clean way of using the ACPI table.
> >> The two options are either to use a reserved field (which who knows, might
> >> be used one day) or some OEM ID. I am also not familiar with
> >> PCI_COMMAND.MEMORY=0, so Ajay can hopefully give some answer about that.
> >> 
> >> Anyhow, I understand (although not relate) to the objection for a new global
> >> variable. How about explicitly calling this hardware bug a “bug” and using
> >> the proper infrastructure? Calling it explicitly a bug may even push whoever
> >> can to resolve it.
> > 
> > 
> > I am a lot more concerned with how we propagate it externally than
> > within Linux. If we hard code that all Linux kernels 6.2+ that are
> > running in VMware prefer ECAM over PIO, we lock ourselves into that
> > stance for better or worse, which means:
> > 
> > * All past and future versions of any VMware hypervisor product have to
> > always allow ECAM access for any PCIe config space write
> > * No other hypervisor benefits from any of this without upstream code change
> > * No real hardware platform benefits from this without upstream code change
> > 
> > By moving it into MCFG, we can create a path for the outside environment
> > to tell the OS whether it's safe to use ECAM always. This obviously
> > doesn't work with MCFG as it stands today, we'd have to propose an MCFG
> > spec change to the PCI SIG's "PCI Firmware Specification" to add the
> > respective field. Future VMware versions could then always expose the
> > flag - and if you find it broken, remove it again.
> > 
> > Putting all of the logic on which system potentially prefers ECAM over
> > PIO config space access into Linux is just a big hack that we should
> > avoid as much as possible.
> 
> Thanks Alex. You raise important points. Let me try to break down your
> concerns slightly differently:
> 
> 1. Enabling MMIO access should be selective, and potentially controlled by
> the hypervisor. The very least a "chicken-bit” is needed.
> 
> 2. PCI SIG would change its specifications to address unclear hardware bug.
> 
> I think (1) makes sense and we can discuss different ways of addressing it.
> But (2) would not happen in a reasonable timeline and seems to me as an
> unnecessary complication.
> 
> But before we discuss how to address the issue, perhaps we need to first
> understand it better. I am not sure that I understand this MMIO bug, and so
> far nobody was able to provide exact details.
> 
> So I went to have a look. It might not be super helpful, but for the record,
> here is what I collected.
> 
> First, we have commit d6ece5491ae71d ("i386/x86-64 Correct for broken MCFG
> tables on K8 systems”). It tried to "try to discover all devices on bus 0
> that are unreachable using MM and fallback for them.” Interestingly, it
> seems similar to FreeBSD code (commit 2d10570afe2b3e) that also mentions K8
> and has similar detection logic in FreeBSD’s pcie_cfgregopen().
> 
> Then commit a0ca9909609470 ("PCI x86: always use conf1 to access config
> space below 256 bytes”). The correspondence [1] mentions some bugs: ATI
> chipset, VIA chipset, Intel 3 Series Express chipset family and some reports
> on Nvidia. It turned out some devices had problem probing - to figure out if
> MMIO is broken - the way the previous patch did.

There's also a statement by Linus that MCFG might not cover all buses
in that thread.  I didn't think the implications through yet ...

> All of these bugs are circa 2008, of course. And note that FreeBSD did not
> take a similar path. The correspondence around Linux patch is endless. I
> admit that I did not understand whether eventually the issues were found to
> be per-bus or per-device.
> 
> 
> Back to the matter at hand. The benefit of using the MCFG approach that you
> propose is that it can enable native systems to use MMIO as well. However,
> since the list of bugs is unclear and the problems might be device-specific,
> it is not clear what information BIOSes have that Linux doesn’t. In other
> words, the benefit of getting it into the specifications is questionable,
> and the complexity+time is high.
> 
> Can we agree that the feature would be enabled explicitly by the hypervisor
> and Linux would enable it based on the hypervisor input (through some
> channel?)
> 
> Thanks,
> Nadav
> 
> [1] https://lore.kernel.org/all/20080112144030.GA19279@jurassic.park.msu.ru/T/#u

Powered by blists - more mailing lists