linux-kernel - Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3999aadf-92a8-43f9-8d9d-84aa47e7d1ae@linux.intel.com>
Date: Fri, 31 May 2024 09:22:51 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>,
 Paolo Bonzini <pbonzini@...hat.com>
Cc: Michael Roth <michael.roth@....com>, kvm@...r.kernel.org,
 linux-coco@...ts.linux.dev, linux-mm@...ck.org,
 linux-crypto@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
 tglx@...utronix.de, mingo@...hat.com, jroedel@...e.de,
 thomas.lendacky@....com, hpa@...or.com, ardb@...nel.org,
 vkuznets@...hat.com, jmattson@...gle.com, luto@...nel.org,
 dave.hansen@...ux.intel.com, slp@...hat.com, pgonda@...gle.com,
 peterz@...radead.org, srinivas.pandruvada@...ux.intel.com,
 rientjes@...gle.com, dovmurik@...ux.ibm.com, tobin@....com, bp@...en8.de,
 vbabka@...e.cz, kirill@...temov.name, ak@...ux.intel.com,
 tony.luck@...el.com, sathyanarayanan.kuppuswamy@...ux.intel.com,
 alpergun@...gle.com, jarkko@...nel.org, ashish.kalra@....com,
 nikunj.dadhania@....com, pankaj.gupta@....com, liam.merwick@...cle.com,
 Brijesh Singh <brijesh.singh@....com>,
 Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page
 State Change VMGEXIT



On 5/30/2024 4:02 AM, Sean Christopherson wrote:
> On Tue, May 28, 2024, Paolo Bonzini wrote:
>> On Mon, May 27, 2024 at 2:26 PM Binbin Wu <binbin.wu@...ux.intel.com> wrote:
>>>> It seems like TDX should be able to do something similar by limiting the
>>>> size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
>>>> returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
>>>> than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
>>>> the entire request and can return to guest, so it actually seems a little
>>>> more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
>>>> between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
>>>> similar names :))
>>>>
>>>> So doesn't seem like there's a good reason to expose any of these
>>>> throttling details to userspace,
>> I think userspace should never be worried about throttling. I would
>> say it's up to the guest to split the GPA into multiple ranges,
> I agree in principle, but in practice I can understand not wanting to split up
> the conversion in the guest due to the additional overhead of the world switches.
>
>>   but that's not how arch/x86/coco/tdx/tdx.c is implemented so instead we can
>>   do the split in KVM instead. It can be a module parameter or VM attribute,
>>   establishing the size that will be processed in a single TDVMCALL.
> Is it just interrupts that are problematic for conversions?  I assume so, because
> I can't think of anything else where telling the guest to retry would be appropriate
> and useful.

The concern was the lockup detection in guest.

>
> If so, KVM shouldn't need to unconditionally restrict the size for a single
> TDVMCALL, KVM just needs to ensure interrupts are handled soonish.  To do that,
> KVM could use a much smaller chunk size, e.g. 64KiB (completely made up number),
> and keep processing the TDVMCALL as long as there is no interrupt pending.
> Hopefully that would obviate the need for a tunable.

Thanks for the suggestion.
By this way, interrupt can be injected to guest in time and the lockup 
detection should not be a problem.

About the chunk size, if it is too small, it will increase the cost of 
kernel/userspace context switches.
Maybe 2MB?