[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <058DA908-87F7-438E-9850-9CD9DCCFD928@zytor.com>
Date: Mon, 11 Jul 2022 11:18:40 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Ajay Kaher <akaher@...are.com>, Nadav Amit <namit@...are.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: Matthew Wilcox <willy@...radead.org>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
Srivatsa Bhat <srivatsab@...are.com>,
"srivatsa@...il.mit.edu" <srivatsa@...il.mit.edu>,
Alexey Makhalov <amakhalov@...are.com>,
Anish Swaminathan <anishs@...are.com>,
Vasavi Sirnapalli <vsirnapalli@...are.com>,
"er.ajay.kaher@...il.com" <er.ajay.kaher@...il.com>,
Bjorn Helgaas <helgaas@...nel.org>
Subject: Re: [PATCH] MMIO should have more priority then IO
On July 11, 2022 10:53:54 AM PDT, Ajay Kaher <akaher@...are.com> wrote:
>
>On 11/07/22, 10:34 PM, "Nadav Amit" <namit@...are.com> wrote:
>
>> On Jul 10, 2022, at 11:31 PM, Ajay Kaher <akaher@...are.com> wrote:
>>
>> During boot-time there are many PCI reads. Currently, when these reads are
>> performed by a virtual machine, they all cause a VM-exit, and therefore each
>> one of them induces a considerable overhead.
>>
>> When using MMIO (but not PIO), it is possible to map the PCI BARs of the
>> virtual machine to some memory area that holds the values that the “emulated
>> hardware” is supposed to return. The memory region is mapped as "read-only”
>> in the NPT/EPT, so reads from these BAR regions would be treated as regular
>> memory reads. Writes would still be trapped and emulated by the hypervisor.
>
>I guess some typo mistake in above paragraph, it's per-device PCI config space
>i.e. 4KB ECAM not PCI BARs. Please read above paragraph as:
>
>When using MMIO (but not PIO), it is possible to map the PCI config space of the
>virtual machine to some memory area that holds the values that the “emulated
>hardware” is supposed to return. The memory region is mapped as "read-only”
>in the NPT/EPT, so reads from these PCI config space would be treated as regular
>memory reads. Writes would still be trapped and emulated by the hypervisor.
>
>We will send v2 or new patch which will be VMware specific.
>
>> I have a vague recollection from some similar project that I had 10 years
>> ago that this might not work for certain emulated device registers. For
>> instance some hardware registers, specifically those the report hardware
>> events, are “clear-on-read”. Apparently, Ajay took that into consideration.
>>
>> That is the reason for this quite amazing difference - several orders of
>> magnitude - between the overhead that is caused by raw_pci_read(): 120us for
>> PIO and 100ns for MMIO. Admittedly, I do not understand why PIO access would
>> take 120us (I would have expected it to be 10 times faster, at least), but
>> the benefit is quite clear.
>
>
>
For one thing, please correct the explanation.
It does not take "more PCI cycles" to use PIO – they are exactly the same, in fact. The source of improvements are all in the CPU and VMM interfaces; on the PCI bus, they are (mostly) just address spaces.
"Using MMIO may allow a VMM to map a shadow memory area readonly, so read transactions can be executed without needing any VMEXIT at all. In contrast, PIO transactions to PCI configuration space are done through an indirect address-data interface, requiring two VMEXITs per transaction regardless of the properties of the underlying register."
You should call out exactly what is being done to prevent incorrect handling of registers with read side effects (I believe that would be all on the VMM side; unfortunately the presence of a register with read side effects probably would mean losing this optimization for the entire 4K page = this entire function, but read side effects have always been discouraged although not prohibited in config space.)
Powered by blists - more mailing lists