lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 06 Aug 2020 11:19:55 +0200
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Peter Xu <peterx@...hat.com>,
        Julia Suvorova <jsuvorov@...hat.com>,
        Andy Lutomirski <luto@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] KVM: x86: KVM_MEM_PCI_HOLE memory

"Michael S. Tsirkin" <mst@...hat.com> writes:

> On Tue, Jul 28, 2020 at 04:37:38PM +0200, Vitaly Kuznetsov wrote:
>> This is a continuation of "[PATCH RFC 0/5] KVM: x86: KVM_MEM_ALLONES
>> memory" work: 
>> https://lore.kernel.org/kvm/20200514180540.52407-1-vkuznets@redhat.com/
>> and pairs with Julia's "x86/PCI: Use MMCONFIG by default for KVM guests":
>> https://lore.kernel.org/linux-pci/20200722001513.298315-1-jusual@redhat.com/
>> 
>> PCIe config space can (depending on the configuration) be quite big but
>> usually is sparsely populated. Guest may scan it by accessing individual
>> device's page which, when device is missing, is supposed to have 'pci
>> hole' semantics: reads return '0xff' and writes get discarded.
>> 
>> When testing Linux kernel boot with QEMU q35 VM and direct kernel boot
>> I observed 8193 accesses to PCI hole memory. When such exit is handled
>> in KVM without exiting to userspace, it takes roughly 0.000001 sec.
>> Handling the same exit in userspace is six times slower (0.000006 sec) so
>> the overal; difference is 0.04 sec. This may be significant for 'microvm'
>> ideas.
>> 
>> Note, the same speed can already be achieved by using KVM_MEM_READONLY
>> but doing this would require allocating real memory for all missing
>> devices and e.g. 8192 pages gives us 32mb. This will have to be allocated
>> for each guest separately and for 'microvm' use-cases this is likely
>> a no-go.
>> 
>> Introduce special KVM_MEM_PCI_HOLE memory: userspace doesn't need to
>> back it with real memory, all reads from it are handled inside KVM and
>> return '0xff'. Writes still go to userspace but these should be extremely
>> rare.
>> 
>> The original 'KVM_MEM_ALLONES' idea had additional optimizations: KVM
>> was mapping all 'PCI hole' pages to a single read-only page stuffed with
>> 0xff. This is omitted in this submission as the benefits are unclear:
>> KVM will have to allocate SPTEs (either on demand or aggressively) and
>> this also consumes time/memory.
>
> Curious about this: if we do it aggressively on the 1st fault,
> how long does it take to allocate 256 huge page SPTEs?
> And the amount of memory seems pretty small then, right?

Right, this could work but we'll need a 2M region (one per KVM host of
course) filled with 0xff-s instead of a single 4k page.

Generally, I'd like to reach an agreement on whether this feature (and
the corresponding Julia's patch addding PV feature bit) is worthy. In
case it is (meaning it gets merged in this simplest form), we can
suggest further improvements. It would also help if firmware (SeaBIOS,
OVMF) would start recognizing the PV feature bit too, this way we'll be
seeing even bigger improvement and this may or may not be a deal-breaker
when it comes to the 'aggressive PTE mapping' idea.

-- 
Vitaly

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ