linux-kernel - Re: [RFC 02/14] x86/apic: Initialize Secure AVIC APIC backing page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e4568d3d-f115-4931-bbc6-9a32eb04ee1c@amd.com>
Date: Wed, 9 Oct 2024 23:22:58 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: Dave Hansen <dave.hansen@...el.com>, linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
 Thomas.Lendacky@....com, nikunj@....com, Santosh.Shukla@....com,
 Vasant.Hegde@....com, Suravee.Suthikulpanit@....com, bp@...en8.de,
 David.Kaplan@....com, x86@...nel.org, hpa@...or.com, peterz@...radead.org,
 seanjc@...gle.com, pbonzini@...hat.com, kvm@...r.kernel.org
Subject: Re: [RFC 02/14] x86/apic: Initialize Secure AVIC APIC backing page



On 10/9/2024 10:33 PM, Dave Hansen wrote:
> On 10/9/24 09:31, Neeraj Upadhyay wrote:
>>> Second, this looks to be allocating a potentially large physically
>>> contiguous chunk of memory, then handing it out 4k at a time.  The loop is:
>>>
>>> 	buf = alloc(NR_CPUS * PAGE_SIZE);
>>> 	for (i = 0; i < NR_CPUS; i++)
>>> 		foo[i] = buf + i * PAGE_SIZE;
>>>
>>> but could be:
>>>
>>> 	for (i = 0; i < NR_CPUS; i++)
>>> 		foo[i] = alloc(PAGE_SIZE);
>>>
>>> right?
>>
>> Single contiguous allocation is done here to avoid TLB impact due to backing page
>> accesses (e.g. sending ipi requires writing to target CPU's backing page).
>> I can change it to allocation in chunks of size 2M instead of one big allocation.
>> Is that fine? Also, as described in commit message, reserving entire 2M chunk
>> for backing pages also prevents splitting of NPT entries into individual 4K entries.
>> This can happen if part of a 2M page is not allocated for backing pages by guest
>> and page state change (from private to shared) is done for that part.
> 
> Ick.
> 
> First, this needs to be thoroughly commented, not in the changelogs.
> 

Ok.

> Second, this is premature optimization at its finest.  Just imagine if
> _every_ site that needed 16k or 32k of shared memory decided to allocate
> a 2M chunk for this _and_ used it sparsely.  What's the average number
> of vCPUs in a guest.  4?  8?
> 

Got it.

> The absolute minimum that we can do here is some stupid infrastructure
> that you call for allocating shared pages, or for things that _will_ be
> converted to shared so they get packed.
> 
> But hacking uncommented 2M allocations into every site seems like
> insanity to me.
> 
> IMNHO, you can either invest the time to put the infrastructure in place
> and get 2M pages, or you can live with the suboptimal performance of 4k.

I will start with 4K. For later, I will get the performance numbers to propose
a change in allocation scheme  - for ex, allocating a bigger contiguous
batch from the total allocation required for backing pages (num_possible_cpus() * 4K)
without doing 2M reservation.


- Neeraj