[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DE2N8AOQ1A0Y.1PVEXY6ULPCFV@google.com>
Date: Fri, 07 Nov 2025 17:37:03 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Brendan Jackman <jackmanb@...gle.com>, Patrick Roy <patrick.roy@...pus.lmu.de>
Cc: Patrick Roy <roypat@...zon.co.uk>, <pbonzini@...hat.com>, <corbet@....net>,
<maz@...nel.org>, <oliver.upton@...ux.dev>, <joey.gouly@....com>,
<suzuki.poulose@....com>, <yuzenghui@...wei.com>, <catalin.marinas@....com>,
<will@...nel.org>, <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <x86@...nel.org>, <hpa@...or.com>,
<luto@...nel.org>, <peterz@...radead.org>, <willy@...radead.org>,
<akpm@...ux-foundation.org>, <david@...hat.com>, <lorenzo.stoakes@...cle.com>,
<Liam.Howlett@...cle.com>, <vbabka@...e.cz>, <rppt@...nel.org>,
<surenb@...gle.com>, <mhocko@...e.com>, <song@...nel.org>, <jolsa@...nel.org>,
<ast@...nel.org>, <daniel@...earbox.net>, <andrii@...nel.org>,
<martin.lau@...ux.dev>, <eddyz87@...il.com>, <yonghong.song@...ux.dev>,
<john.fastabend@...il.com>, <kpsingh@...nel.org>, <sdf@...ichev.me>,
<haoluo@...gle.com>, <jgg@...pe.ca>, <jhubbard@...dia.com>,
<peterx@...hat.com>, <jannh@...gle.com>, <pfalcato@...e.de>,
<shuah@...nel.org>, <seanjc@...gle.com>, <kvm@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <kvmarm@...ts.linux.dev>,
<linux-fsdevel@...r.kernel.org>, <linux-mm@...ck.org>, <bpf@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>, <xmarcalx@...zon.co.uk>,
<kalyazin@...zon.co.uk>, <jackabt@...zon.co.uk>, <derekmn@...zon.co.uk>,
<tabba@...gle.com>, <ackerleytng@...gle.com>
Subject: Re: [PATCH v7 00/12] Direct Map Removal Support for guest_memfd
On Fri Nov 7, 2025 at 3:54 PM UTC, Brendan Jackman wrote:
> On Wed Sep 24, 2025 at 3:10 PM UTC, Patrick Roy wrote:
>> From: Patrick Roy <roypat@...zon.co.uk>
>>
>> [ based on kvm/next ]
>>
>> Unmapping virtual machine guest memory from the host kernel's direct map is a
>> successful mitigation against Spectre-style transient execution issues: If the
>> kernel page tables do not contain entries pointing to guest memory, then any
>> attempted speculative read through the direct map will necessarily be blocked
>> by the MMU before any observable microarchitectural side-effects happen. This
>> means that Spectre-gadgets and similar cannot be used to target virtual machine
>> memory. Roughly 60% of speculative execution issues fall into this category [1,
>> Table 1].
>>
>> This patch series extends guest_memfd with the ability to remove its memory
>> from the host kernel's direct map, to be able to attain the above protection
>> for KVM guests running inside guest_memfd.
>>
>> Additionally, a Firecracker branch with support for these VMs can be found on
>> GitHub [2].
>>
>> For more details, please refer to the v5 cover letter [v5]. No
>> substantial changes in design have taken place since.
>>
>> === Changes Since v6 ===
>>
>> - Drop patch for passing struct address_space to ->free_folio(), due to
>> possible races with freeing of the address_space. (Hugh)
>> - Stop using PG_uptodate / gmem preparedness tracking to keep track of
>> direct map state. Instead, use the lowest bit of folio->private. (Mike, David)
>> - Do direct map removal when establishing mapping of gmem folio instead
>> of at allocation time, due to impossibility of handling direct map
>> removal errors in kvm_gmem_populate(). (Patrick)
>> - Do TLB flushes after direct map removal, and provide a module
>> parameter to opt out from them, and a new patch to export
>> flush_tlb_kernel_range() to KVM. (Will)
>>
>> [1]: https://download.vusec.net/papers/quarantine_raid23.pdf
>> [2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding
>
> I just got around to trying this out, I checked out this patchset using
> its base-commit and grabbed the Firecracker branch. Things seem OK until
> I set the secrets_free flag in the Firecracker config which IIUC makes
> it set GUEST_MEMFD_FLAG_NO_DIRECT_MAP.
>
> If I set it, I find the guest doesn't show anything on the console.
> Running it in a VM and attaching GDB suggests that it's entering the
> guest repeatedly, it doesn't seem like the vCPU thread is stuck or
> anything. I'm a bit clueless about how to debug that (so far, whenever
> I've broken KVM, things always exploded very dramatically).
I discovered that Firecracker has a GDB stub, so I can just attach to
that and see what the guest is up to.
The issue that the pvclock_vcpu_time_info in kvmclock is all zero:
(gdb) backtrace
#0 pvclock_tsc_khz (src=0xffffffff83a03000 <hv_clock_boot>) at ../arch/x86/kernel/pvclock.c:28
#1 0xffffffff8109d137 in kvm_get_tsc_khz () at ../arch/x86/include/asm/kvmclock.h:11
#2 0xffffffff835c1842 in kvm_get_preset_lpj () at ../arch/x86/kernel/kvmclock.c:128
#3 kvmclock_init () at ../arch/x86/kernel/kvmclock.c:332
#4 0xffffffff835c1487 in kvm_init_platform () at ../arch/x86/kernel/kvm.c:982
#5 0xffffffff835a83df in setup_arch (cmdline_p=cmdline_p@...ry=0xffffffff82e03f00) at ../arch/x86/kernel/setup.c:916
#6 0xffffffff83595a22 in start_kernel () at ../init/main.c:925
#7 0xffffffff835a7354 in x86_64_start_reservations (
real_mode_data=real_mode_data@...ry=0x36326c0 <error: Cannot access memory at address 0x36326c0>) at ../arch/x86/kernel/head64.c:507
#8 0xffffffff835a7466 in x86_64_start_kernel (real_mode_data=0x36326c0 <error: Cannot access memory at address 0x36326c0>)
at ../arch/x86/kernel/head64.c:488
#9 0xffffffff8103e7fd in secondary_startup_64 () at ../arch/x86/kernel/head_64.S:413
#10 0x0000000000000000 in ?? ()
(gdb) p *src
$3 = {version = 0, pad0 = 0, tsc_timestamp = 0, system_time = 0, tsc_to_system_mul = 0, tsc_shift = 0 '\000', flags = 0 '\000',
pad = "\000"}
This causes a divide by zero in kvm_get_tsc_khz().
Probably the only reason I didn't see any console output is that I
forgot to set earlyprintk, oops...
Powered by blists - more mailing lists