linux-kernel - Re: [RFC 02/14] x86/apic: Initialize Secure AVIC APIC backing page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2436d521-aa4c-45ac-9ccc-be9a4b5cb391@intel.com>
Date: Wed, 9 Oct 2024 10:03:19 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Neeraj Upadhyay <Neeraj.Upadhyay@....com>, linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
 Thomas.Lendacky@....com, nikunj@....com, Santosh.Shukla@....com,
 Vasant.Hegde@....com, Suravee.Suthikulpanit@....com, bp@...en8.de,
 David.Kaplan@....com, x86@...nel.org, hpa@...or.com, peterz@...radead.org,
 seanjc@...gle.com, pbonzini@...hat.com, kvm@...r.kernel.org
Subject: Re: [RFC 02/14] x86/apic: Initialize Secure AVIC APIC backing page

On 10/9/24 09:31, Neeraj Upadhyay wrote:
>> Second, this looks to be allocating a potentially large physically
>> contiguous chunk of memory, then handing it out 4k at a time.  The loop is:
>>
>> 	buf = alloc(NR_CPUS * PAGE_SIZE);
>> 	for (i = 0; i < NR_CPUS; i++)
>> 		foo[i] = buf + i * PAGE_SIZE;
>>
>> but could be:
>>
>> 	for (i = 0; i < NR_CPUS; i++)
>> 		foo[i] = alloc(PAGE_SIZE);
>>
>> right?
> 
> Single contiguous allocation is done here to avoid TLB impact due to backing page
> accesses (e.g. sending ipi requires writing to target CPU's backing page).
> I can change it to allocation in chunks of size 2M instead of one big allocation.
> Is that fine? Also, as described in commit message, reserving entire 2M chunk
> for backing pages also prevents splitting of NPT entries into individual 4K entries.
> This can happen if part of a 2M page is not allocated for backing pages by guest
> and page state change (from private to shared) is done for that part.

Ick.

First, this needs to be thoroughly commented, not in the changelogs.

Second, this is premature optimization at its finest.  Just imagine if
_every_ site that needed 16k or 32k of shared memory decided to allocate
a 2M chunk for this _and_ used it sparsely.  What's the average number
of vCPUs in a guest.  4?  8?

The absolute minimum that we can do here is some stupid infrastructure
that you call for allocating shared pages, or for things that _will_ be
converted to shared so they get packed.

But hacking uncommented 2M allocations into every site seems like
insanity to me.

IMNHO, you can either invest the time to put the infrastructure in place
and get 2M pages, or you can live with the suboptimal performance of 4k.