[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <358df653-e572-4e76-954a-15b230d09263@amd.com>
Date: Thu, 24 Oct 2024 18:01:16 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org,
tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
Thomas.Lendacky@....com, nikunj@....com, Santosh.Shukla@....com,
Vasant.Hegde@....com, Suravee.Suthikulpanit@....com, David.Kaplan@....com,
x86@...nel.org, hpa@...or.com, peterz@...radead.org, seanjc@...gle.com,
pbonzini@...hat.com, kvm@...r.kernel.org
Subject: Re: [RFC 02/14] x86/apic: Initialize Secure AVIC APIC backing page
On 10/24/2024 5:19 PM, Borislav Petkov wrote:
> On Thu, Oct 24, 2024 at 09:31:01AM +0530, Neeraj Upadhyay wrote:
>> Please let me know if I didn't understand your questions correctly. The performance
>> concerns here are w.r.t. these backing page allocations being part of a single
>> hugepage.
>>
>> Grouping of allocation together allows these pages to be part of the same 2M NPT
>> and RMP table entry, which can provide better performance compared to having
>> separate 4K entries for each backing page. For example, to send IPI to target CPUs,
>> ->send_IPI callback (executing on source CPU) in Secure AVIC driver writes to the
>> backing page of target CPU. Having these backing pages as part of the single
>> 2M entry could provide better caching of the translation and require single entry
>> in TLB at the source CPU.
>
> Lemme see if I understand you correctly: you want a single 2M page to contain
> *all* backing pages so that when the HV wants to send IPIs etc, the first vCPU
With Secure AVIC enabled, source vCPU directly writes to the Interrupt Request Register
(IRR) offset in the target CPU's backing page. So, the IPI is directly requested in
target vCPU's backing page by source vCPU context and not by HV.
> will load the page translation into the TLB and the following ones will have
> it already?
>
Yes, but the following ones will be already present in source vCPU's CPU TLB.
> Versus having separate 4K pages which would mean that everytime a vCPU's backing
> page is needed, every vCPU would have to do a TLB walk and pull it in so that
> the mapping's there?
>
The walk is done by source CPU here, as it is the one which is writing to the
backing page of target vCPUs.
> Am I close?
>
I have clarified some parts above. Basically, source vCPU is directly writing to
remote backing pages.
> If so, what's the problem with loading that backing page each time you VMRUN
> the vCPU?
>
As I clarified above, it's the source vCPU which need to load each backing page.
> IOW, how noticeable would that be?
>
I don't have the data at this point. That is the reason I will send this contiguous
allocation as a separate patch (if required) when I can get data on some workloads
which are impacted by this.
> And what guarantees that the 2M page containing the backing pages would always
> remain in the TLB?
>
For smp_call_function_many(), where a source CPU sends IPI to multiple CPUs,
source CPU writes to backing pages of different target CPUs within this function.
So, accesses have temporal locality. For other use cases, I need to enable
perf with Secure AVIC to collect the TLB misses on a IPI benchmark and get
back with the numbers.
- Neeraj
> Hmmm.
>
Powered by blists - more mailing lists