linux-kernel - Re: [PATCH 00/10] Introduce guestmemfs: persistent in-memory filesystem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <883a0f0d-7342-479e-aa3c-13deb7e99338@redhat.com>
Date: Tue, 6 Aug 2024 15:43:24 +0200
From: David Hildenbrand <david@...hat.com>
To: "Gowans, James" <jgowans@...zon.com>, "jack@...e.cz" <jack@...e.cz>,
 "muchun.song@...ux.dev" <muchun.song@...ux.dev>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "rppt@...nel.org" <rppt@...nel.org>, "brauner@...nel.org"
 <brauner@...nel.org>, "Graf (AWS), Alexander" <graf@...zon.de>,
 "anthony.yznaga@...cle.com" <anthony.yznaga@...cle.com>,
 "steven.sistare@...cle.com" <steven.sistare@...cle.com>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "Durrant, Paul" <pdurrant@...zon.co.uk>,
 "seanjc@...gle.com" <seanjc@...gle.com>,
 "pbonzini@...hat.com" <pbonzini@...hat.com>,
 "linux-mm@...ck.org" <linux-mm@...ck.org>,
 "Woodhouse, David" <dwmw@...zon.co.uk>,
 "Saenz Julienne, Nicolas" <nsaenz@...zon.es>,
 "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
 "nh-open-source@...zon.com" <nh-open-source@...zon.com>,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
 "jgg@...pe.ca" <jgg@...pe.ca>
Subject: Re: [PATCH 00/10] Introduce guestmemfs: persistent in-memory
 filesystem

> 1. Secret hiding: with guestmemfs all of the memory is out of the kernel
> direct map as an additional defence mechanism. This means no
> read()/write() syscalls to guestmemfs files, and no IO to it. The only
> way to access it is to mmap the file.

There are people interested into similar things for guest_memfd.

> 
> 2. No struct page overhead: the intended use case is for systems whose
> sole job is to be a hypervisor, typically for large (multi-GiB) VMs, so
> the majority of system RAM would be donated to this fs. We definitely
> don't want 4 KiB struct pages here as it would be a significant
> overhead. That's why guestmemfs carves the memory out in early boot and
> sets memblock flags to avoid struct page allocation. I don't know if
> hugetlbfs does anything fancy to avoid allocating PTE-level struct pages
> for its memory?

Sure, it's called HVO and can optimize out a significant portion of the 
vmemmap.

> 
> 3. guest_memfd interface: For confidential computing use-cases we need
> to provide a guest_memfd style interface so that these FDs can be used
> as a guest_memfd file in KVM memslots. Would there be interest in
> extending hugetlbfs to also support a guest_memfd style interface?
> 

"Extending hugetlbfs" sounds wrong; hugetlbfs is a blast from the past 
and not something people are particularly keen to extend for such use 
cases. :)

Instead, as Jason said, we're looking into letting guest_memfd own and 
manage large chunks of contiguous memory.

> 4. Metadata designed for persistence: guestmemfs will need to keep
> simple internal metadata data structures (limited allocations, limited
> fragmentation) so that pages can easily and efficiently be marked as
> persistent via KHO. Something like slab allocations would probably be a
> no-go as then we'd need to persist and reconstruct the slab allocator. I
> don't know how hugetlbfs structures its fs metadata but I'm guessing it
> uses the slab and does lots of small allocations so trying to retrofit
> persistence via KHO to it may be challenging.
> 
> 5. Integration with persistent IOMMU mappings: to keep DMA running
> across kexec, iommufd needs to know that the backing memory for an IOAS
> is persistent too. The idea is to do some DMA pinning of persistent
> files, which would require iommufd/guestmemfs integration - would we
> want to add this to hugetlbfs?
> 
> 6. Virtualisation-specific APIs: starting to get a bit esoteric here,
> but use-cases like being able to carve out specific chunks of memory
> from a running VM and turn it into memory for another side car VM, or
> doing post-copy LM via DMA by mapping memory into the IOMMU but taking
> page faults on the CPU. This may require virtualisation-specific ioctls
> on the files which wouldn't be generally applicable to hugetlbfs.
> 
> 7. NUMA control: a requirement is to always have correct NUMA affinity.
> While currently not implemented the idea is to extend the guestmemfs
> allocation to support specifying allocation sizes from each NUMA node at
> early boot, and then having multiple mount points, one per NUMA node (or
> something like that...). Unclear if this is something hugetlbfs would
> want.
> 
> There are probably more potential issues, but those are the ones that
> come to mind... That being said, if hugetlbfs maintainers are interested
> in going in this direction then we can definitely look at enhancing
> hugetlbfs.
> 
> I think there are two types of problems: "Would hugetlbfs want this
> functionality?" - that's the majority. An a few are "This would be hard
> with hugetlbfs!" - persistence probably falls into this category.

I'm much rather asking myself if you should instead teach/extend the 
guest_memfd concept by some of what you propose here.

At least "guest_memfd" sounds a lot like the "anonymous fd" based 
variant of guestmemfs ;)

Like we have hugetlbfs and memfd with hugetlb pages.

-- 
Cheers,

David / dhildenb