linux-kernel - Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map when adding them to the RMP table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240116165025.g4iouboabyxkn5nd@amd.com>
Date: Tue, 16 Jan 2024 10:50:25 -0600
From: Michael Roth <michael.roth@....com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Tom Lendacky <thomas.lendacky@....com>, Borislav Petkov <bp@...en8.de>,
	<x86@...nel.org>, <kvm@...r.kernel.org>, <linux-coco@...ts.linux.dev>,
	<linux-mm@...ck.org>, <linux-crypto@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<jroedel@...e.de>, <hpa@...or.com>, <ardb@...nel.org>, <pbonzini@...hat.com>,
	<seanjc@...gle.com>, <vkuznets@...hat.com>, <jmattson@...gle.com>,
	<luto@...nel.org>, <dave.hansen@...ux.intel.com>, <slp@...hat.com>,
	<pgonda@...gle.com>, <peterz@...radead.org>,
	<srinivas.pandruvada@...ux.intel.com>, <rientjes@...gle.com>,
	<tobin@....com>, <vbabka@...e.cz>, <kirill@...temov.name>,
	<ak@...ux.intel.com>, <tony.luck@...el.com>,
	<sathyanarayanan.kuppuswamy@...ux.intel.com>, <alpergun@...gle.com>,
	<jarkko@...nel.org>, <ashish.kalra@....com>, <nikunj.dadhania@....com>,
	<pankaj.gupta@....com>, <liam.merwick@...cle.com>, <zhi.a.wang@...el.com>,
	Brijesh Singh <brijesh.singh@....com>, <rppt@...nel.org>
Subject: Re: [PATCH v1 11/26] x86/sev: Invalidate pages from the direct map
 when adding them to the RMP table

On Tue, Jan 16, 2024 at 10:19:09AM -0600, Michael Roth wrote:
> I did some performance tests which do seem to indicate that
> pre-splitting the directmap to 4K can be substantially improve certain
> SNP guest workloads. This test involves running a single 1TB SNP guest
> with 128 vCPUs running "stress --vm 128 --vm-bytes 5G --vm-keep" to
> rapidly fault in all of its memory via lazy acceptance, and then
> measuring the rate that gmem pages are being allocated on the host by
> monitoring "FileHugePages" from /proc/meminfo to get some rough gauge
> of how quickly a guest can fault in it's initial working set prior to
> reaching steady state. The data is a bit noisy but seems to indicate
> significant improvement by taking the directmap updates out of the
> lazy acceptance path, and I would only expect that to become more
> significant as you scale up the number of guests / vCPUs.
> 
>   # Average fault-in rate across 3 runs, measured in GB/s
>                     unpinned | pinned to NUMA node 0
>   DirectMap4K           12.9 | 12.1
>              stddev      2.2 |  1.3
>   DirectMap2M+split      8.0 |  8.9
>              stddev      1.3 |  0.8
> 
> The downside of course is potential impact for non-SNP workloads
> resulting from splitting the directmap. Mike Rapoport's numbers make
> me feel a little better about it, but I don't think they apply directly
> to the notion of splitting the entire directmap. It's Even he LWN article
> summarizes:
> 
>   "The conclusion from all of this, Rapoport continued, was that
>   direct-map fragmentation just does not matter — for data access, at
>   least. Using huge-page mappings does still appear to make a difference
>   for memory containing the kernel code, so allocator changes should
>   focus on code allocations — improving the layout of allocations for
>   loadable modules, for example, or allowing vmalloc() to allocate huge
>   pages for code. But, for kernel-data allocations, direct-map
>   fragmentation simply appears to not be worth worrying about."
> 
> So at the very least, if we went down this path, we would be worth
> investigating the following areas in addition to general perf testing:
> 
>   1) Only splitting directmap regions corresponding to kernel-allocatable
>      *data* (hopefully that's even feasible...)
>   2) Potentially deferring the split until an SNP guest is actually
>      run, so there isn't any impact just from having SNP enabled (though
>      you still take a hit from RMP checks in that case so maybe it's not
>      worthwhile, but that itself has been noted as a concern for users
>      so it would be nice to not make things even worse).

There's another potential area of investigation I forgot to mention that
doesn't involve pre-splitting the directmap. It makes use of the fact
that the kernel should never be accessing a 2MB mapping that overlaps with
private guest memory if the backing PFN for the guest memory is a 2MB page.
Since there's no chance for overlap (well, maybe via a 1GB directmap entry,
but not as dramatic a change to force those to 2M), there's no need to
actually split the directmap entry in these cases since they won't
result in unexpected RMP faults.

So if pre-splitting the directmap ends up having too many downsides, then
there may still some potential for optimizing the current approach to a
fair degree.

-Mike