linux-kernel - Re: [PATCH] KVM: x86: async_pf: check earlier if can deliver async pf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b7d21cce-720f-4db3-bbb4-0be17e33cd09@amazon.com>
Date: Mon, 25 Nov 2024 15:50:05 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <hpa@...or.com>,
	<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <david@...hat.com>,
	<peterx@...hat.com>, <oleg@...hat.com>, <vkuznets@...hat.com>,
	<gshan@...hat.com>, <graf@...zon.de>, <jgowans@...zon.com>,
	<roypat@...zon.co.uk>, <derekmn@...zon.com>, <nsaenz@...zon.es>,
	<xmarcalx@...zon.com>
Subject: Re: [PATCH] KVM: x86: async_pf: check earlier if can deliver async pf



On 21/11/2024 21:05, Sean Christopherson wrote:
> On Thu, Nov 21, 2024, Nikita Kalyazin wrote:
>> On 19/11/2024 13:24, Sean Christopherson wrote:
>>> None of this justifies breaking host-side, non-paravirt async page faults.  If a
>>> vCPU hits a missing page, KVM can schedule out the vCPU and let something else
>>> run on the pCPU, or enter idle and let the SMT sibling get more cycles, or maybe
>>> even enter a low enough sleep state to let other cores turbo a wee bit.
>>>
>>> I have no objection to disabling host async page faults, e.g. it's probably a net
>>> negative for 1:1 vCPU:pCPU pinned setups, but such disabling needs an opt-in from
>>> userspace.
>>
>> That's a good point, I didn't think about it.  The async work would still
>> need to execute somewhere in that case (or sleep in GUP until the page is
>> available).
> 
> The "async work" is often an I/O operation, e.g. to pull in the page from disk,
> or over the network from the source.  The *CPU* doesn't need to actively do
> anything for those operations.  The I/O is initiated, so the CPU can do something
> else, or go idle if there's no other work to be done.
> 
>> If processing the fault synchronously, the vCPU thread can also sleep in the
>> same way freeing the pCPU for something else,
> 
> If and only if the vCPU can handle a PV async #PF.  E.g. if the guest kernel flat
> out doesn't support PV async #PF, or the fault happened while the guest was in an
> incompatible mode, etc.
> 
> If KVM doesn't do async #PFs of any kind, the vCPU will spin on the fault until
> the I/O completes and the page is ready.

I ran a little experiment to see that by backing guest memory by a file 
on FUSE and delaying response to one of the read operations to emulate a 
delay in fault processing.

1. Original (the patch isn't applied)

vCPU thread (disk-sleeping):

[<0>] kvm_vcpu_block+0x62/0xe0
[<0>] kvm_arch_vcpu_ioctl_run+0x240/0x1e30
[<0>] kvm_vcpu_ioctl+0x2f1/0x860
[<0>] __x64_sys_ioctl+0x87/0xc0
[<0>] do_syscall_64+0x47/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Async task (disk-sleeping):

[<0>] folio_wait_bit_common+0x116/0x2e0
[<0>] filemap_fault+0xe5/0xcd0
[<0>] __do_fault+0x30/0xc0
[<0>] do_fault+0x9a/0x580
[<0>] __handle_mm_fault+0x684/0x8a0
[<0>] handle_mm_fault+0xc9/0x220
[<0>] __get_user_pages+0x248/0x12c0
[<0>] get_user_pages_remote+0xef/0x470
[<0>] async_pf_execute+0x99/0x1c0
[<0>] process_one_work+0x145/0x360
[<0>] worker_thread+0x294/0x3b0
[<0>] kthread+0xdb/0x110
[<0>] ret_from_fork+0x2d/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

2. With the patch applied (no async task)

vCPU thread (disk-sleeping):

[<0>] folio_wait_bit_common+0x116/0x2e0
[<0>] filemap_fault+0xe5/0xcd0
[<0>] __do_fault+0x30/0xc0
[<0>] do_fault+0x36f/0x580
[<0>] __handle_mm_fault+0x684/0x8a0
[<0>] handle_mm_fault+0xc9/0x220
[<0>] __get_user_pages+0x248/0x12c0
[<0>] get_user_pages_unlocked+0xf7/0x380
[<0>] hva_to_pfn+0x2a2/0x440
[<0>] __kvm_faultin_pfn+0x5e/0x90
[<0>] kvm_mmu_faultin_pfn+0x1ec/0x690
[<0>] kvm_tdp_page_fault+0xba/0x160
[<0>] kvm_mmu_do_page_fault+0x1cc/0x210
[<0>] kvm_mmu_page_fault+0x8e/0x600
[<0>] vmx_handle_exit+0x14c/0x6c0
[<0>] kvm_arch_vcpu_ioctl_run+0xeb1/0x1e30
[<0>] kvm_vcpu_ioctl+0x2f1/0x860
[<0>] __x64_sys_ioctl+0x87/0xc0
[<0>] do_syscall_64+0x47/0x110
[<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

In both cases the fault handling code is blocked and the pCPU is free 
for other tasks.  I can't see the vCPU spinning on the IO to get 
completed if the async task isn't created.  I tried that with and 
without async PF enabled by the guest (MSR_KVM_ASYNC_PF_EN).

What am I missing?

>> so the amount of work to be done looks equivalent (please correct me
>> otherwise).  What's the net gain of moving that to an async work in the host
>> async fault case? "while allowing interrupt delivery into the guest." -- is
>> this the main advantage?