lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e175521-38bb-49f0-b1fb-8820f8708c9c@amazon.co.uk>
Date: Fri, 26 Jul 2024 07:55:16 +0100
From: Patrick Roy <roypat@...zon.co.uk>
To: "Vlastimil Babka (SUSE)" <vbabka@...nel.org>, <seanjc@...gle.com>,
	<pbonzini@...hat.com>, <akpm@...ux-foundation.org>, <dwmw@...zon.co.uk>,
	<rppt@...nel.org>, <david@...hat.com>
CC: <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
	<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
	<willy@...radead.org>, <graf@...zon.com>, <derekmn@...zon.com>,
	<kalyazin@...zon.com>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linux-mm@...ck.org>, <dmatlack@...gle.com>, <tabba@...gle.com>,
	<chao.p.peng@...ux.intel.com>, <xmarcalx@...zon.co.uk>
Subject: Re: [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map



On Mon, 2024-07-22 at 13:28 +0100, "Vlastimil Babka (SUSE)" wrote:
>> === Implementation ===
>>
>> This patch series introduces a new flag to the `KVM_CREATE_GUEST_MEMFD`
>> to remove its pages from the direct map when they are allocated. When
>> trying to run a guest from such a VM, we now face the problem that
>> without either userspace or kernelspace mappings of guest_memfd, KVM
>> cannot access guest memory to, for example, do MMIO emulation of access
>> memory used to guest/host communication. We have multiple options for
>> solving this when running non-CoCo VMs: (1) implement a TDX-light
>> solution, where the guest shares memory that KVM needs to access, and
>> relies on paravirtual solutions where this is not possible (e.g. MMIO),
>> (2) have KVM use userspace mappings of guest_memfd (e.g. a
>> memfd_secret-style solution), or (3) dynamically reinsert pages into the
>> direct map whenever KVM wants to access them.
>>
>> This RFC goes for option (3). Option (1) is a lot of overhead for very
>> little gain, since we are not actually constrained by a physical
>> inability to access guest memory (e.g. we are not in a TDX context where
>> accesses to guest memory cause a #MC). Option (2) has previously been
>> rejected [1].
> 
> Do the pages have to have the same address when they are temporarily mapped?
> Wouldn't it be easier to do something similar to kmap_local_page() used for
> HIMEM? I.e. you get a temporary kernel mapping to do what's needed, but it
> doesn't have to alter the shared directmap.
> 
> Maybe that was already discussed somewhere as unsuitable but didn't spot it
> here.

For what I had prototyped here, there's no requirement to have the pages
mapped at the same address (I remember briefly looking at memremap to
achieve the temporary mappings, but since that doesnt work for normal
memory, I gave up on that path). However, I think guest_memfd is moving
into a direction where ranges marked as "in-place shared" (e.g. those
that are temporarily reinserted into the direct map in this RFC)  should
be able to be GUP'd [1]. I think for that the direct map entries would
need to be present, right?

>> In this patch series, we make sufficient parts of KVM gmem-aware to be
>> able to boot a Linux initrd from private memory on x86. These include
>> KVM's MMIO emulation (including guest page table walking) and kvm-clock.
>> For VM types which do not allow accessing gmem, we return -EFAULT and
>> attempt to prepare a KVM_EXIT_MEMORY_FAULT.
>>
>> Additionally, this patch series adds support for "restricted" userspace
>> mappings of guest_memfd, which work similar to memfd_secret (e.g.
>> disallow get_user_pages), which allows handling I/O and loading the
>> guest kernel in a simple way. Support for this is completely independent
>> of the rest of the functionality introduced in this patch series.
>> However, it is required to build a minimal hypervisor PoC that actually
>> allows booting a VM from a disk.
 
[1]: https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@redhat.com/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ