linux-kernel - Re: [RFC PATCH v3 0/6] Direct Map Removal for guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dcdce8e1-51da-42da-a892-59c6ccd9de23@redhat.com>
Date: Fri, 15 Nov 2024 18:10:33 +0100
From: David Hildenbrand <david@...hat.com>
To: Patrick Roy <roypat@...zon.co.uk>, tabba@...gle.com,
 quic_eberman@...cinc.com, seanjc@...gle.com, pbonzini@...hat.com,
 jthoughton@...gle.com, ackerleytng@...gle.com, vannapurve@...gle.com,
 rppt@...nel.org
Cc: graf@...zon.com, jgowans@...zon.com, derekmn@...zon.com,
 kalyazin@...zon.com, xmarcalx@...zon.com, linux-mm@...ck.org,
 corbet@....net, catalin.marinas@....com, will@...nel.org,
 chenhuacai@...nel.org, kernel@...0n.name, paul.walmsley@...ive.com,
 palmer@...belt.com, aou@...s.berkeley.edu, hca@...ux.ibm.com,
 gor@...ux.ibm.com, agordeev@...ux.ibm.com, borntraeger@...ux.ibm.com,
 svens@...ux.ibm.com, gerald.schaefer@...ux.ibm.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
 hpa@...or.com, luto@...nel.org, peterz@...radead.org, rostedt@...dmis.org,
 mhiramat@...nel.org, mathieu.desnoyers@...icios.com, shuah@...nel.org,
 kvm@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org,
 linux-s390@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 linux-kselftest@...r.kernel.org, faresx@...zon.com
Subject: Re: [RFC PATCH v3 0/6] Direct Map Removal for guest_memfd

On 15.11.24 17:59, Patrick Roy wrote:
> 
> 
> On Tue, 2024-11-12 at 14:52 +0000, David Hildenbrand wrote:
>> On 12.11.24 15:40, Patrick Roy wrote:
>>> I remember talking to someone at some point about whether we could reuse
>>> the proc-local stuff for guest memory, but I cannot remember the outcome
>>> of that discussion... (or maybe I just wanted to have a discussion about
>>> it, but forgot to follow up on that thought?).  I guess we wouldn't use
>>> proc-local _allocations_, but rather just set up proc-local mappings of
>>> the gmem allocations that have been removed from the direct map.
>>
>> Yes. And likely only for memory we really access / try access, if possible.
> 
> Well, if we start on-demand mm-local mapping the things we want to
> access, we're back in TLB flush hell, no?

At least the on-demand mapping shouldn't require a TLB flush? Only 
"unmapping" if we want to restrict the size of a "mapped pool" of 
restricted size.

Anyhow, this would be a pure optimization, to avoid the expense of 
mapping everything, when in practice you'd like not access most of it.

(my theory, happy to be told I'm wrong :) )

> And we can't know
> ahead-of-time what needs to be mapped, so everything would need to be
> mapped (unless we do something like mm-local mapping a page on first
> access, and then just never unmapping it again, under the assumption
> that establishing the mapping won't be expensive)

Right, the whole problem is that we don't know that upfront.

> 
>>>
>>> I'm wondering, where exactly would be the differences to Sean's idea
>>> about messing with the CR3 register inside KVM to temporarily install
>>> page tables that contain all the gmem stuff, conceptually? Wouldn't we
>>> run into the same interrupt problems that Sean foresaw for the CR3
>>> stuff? (which, admittedly, I still don't quite follow what these are :(
>>> ).
>>
>> I'd need some more details on that. If anything would rely on the direct
>> mapping (from IRQ context?) than ... we obviously cannot remove the
>> direct mapping :)
> 
> I've talked to Fares internally, and it seems that generally doing
> mm-local mappings of guest memory would work for us. We also figured out
> what the "interrupt problem" is, namely that if we receive an interrupt
> while executing in a context that has mm-local mappings available, those
> mappings will continue to be available while the interrupt is being
> handled.

Isn't that likely also the case with secretmem where we removed the 
directmap, but have an effective per-mm mapping in the (user-space 
portion) of the page table?

> I'm talking to my security folks to see how much of a concern
> this is for the speculation hardening we're trying to achieve. Will keep
> you in the loop there :)

Thanks!

-- 
Cheers,

David / dhildenb