lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <od4dx6snqsl2qiocgf3jxm4dndxhrlvsfr22eveuno6nskgfdj@mxsywvku2jk5>
Date: Thu, 29 Jan 2026 14:36:14 +0000
From: Quentin Perret <qperret@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Sean Christopherson <seanjc@...gle.com>, 
	Ackerley Tng <ackerleytng@...gle.com>, Alexey Kardashevskiy <aik@....com>, cgroups@...r.kernel.org, 
	kvm@...r.kernel.org, linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, linux-mm@...ck.org, 
	linux-trace-kernel@...r.kernel.org, x86@...nel.org, akpm@...ux-foundation.org, 
	binbin.wu@...ux.intel.com, bp@...en8.de, brauner@...nel.org, chao.p.peng@...el.com, 
	chenhuacai@...nel.org, corbet@....net, dave.hansen@...el.com, 
	dave.hansen@...ux.intel.com, david@...hat.com, dmatlack@...gle.com, erdemaktas@...gle.com, 
	fan.du@...el.com, fvdl@...gle.com, haibo1.xu@...el.com, hannes@...xchg.org, 
	hch@...radead.org, hpa@...or.com, hughd@...gle.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com, jarkko@...nel.org, 
	jgowans@...zon.com, jhubbard@...dia.com, jroedel@...e.de, jthoughton@...gle.com, 
	jun.miao@...el.com, kai.huang@...el.com, keirf@...gle.com, kent.overstreet@...ux.dev, 
	liam.merwick@...cle.com, maciej.wieczor-retman@...el.com, mail@...iej.szmigiero.name, 
	maobibo@...ngson.cn, mathieu.desnoyers@...icios.com, maz@...nel.org, 
	mhiramat@...nel.org, mhocko@...nel.org, mic@...ikod.net, michael.roth@....com, 
	mingo@...hat.com, mlevitsk@...hat.com, mpe@...erman.id.au, muchun.song@...ux.dev, 
	nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev, palmer@...belt.com, 
	pankaj.gupta@....com, paul.walmsley@...ive.com, pbonzini@...hat.com, peterx@...hat.com, 
	pgonda@...gle.com, prsampat@....com, pvorel@...e.cz, richard.weiyang@...il.com, 
	rick.p.edgecombe@...el.com, rientjes@...gle.com, rostedt@...dmis.org, roypat@...zon.co.uk, 
	rppt@...nel.org, shakeel.butt@...ux.dev, shuah@...nel.org, steven.price@....com, 
	steven.sistare@...cle.com, suzuki.poulose@....com, tabba@...gle.com, tglx@...utronix.de, 
	thomas.lendacky@....com, vannapurve@...gle.com, vbabka@...e.cz, viro@...iv.linux.org.uk, 
	vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org, willy@...radead.org, 
	wyihan@...gle.com, xiaoyao.li@...el.com, yan.y.zhao@...el.com, yilun.xu@...el.com, 
	yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v1 05/37] KVM: guest_memfd: Wire up
 kvm_get_memory_attributes() to per-gmem attributes

On Thursday 29 Jan 2026 at 09:42:45 (-0400), Jason Gunthorpe wrote:
> On Thu, Jan 29, 2026 at 11:10:12AM +0000, Quentin Perret wrote:
> 
> > A not-fully-thought-through-and-possibly-ridiculous idea that crossed
> > my mind some time ago was to make KVM itself a proper dmabuf
> > importer. 
> 
> AFAIK this is already the plan. Since Intel cannot tolerate having the
> private MMIO mapped into a VMA *at all* there is no other choice.
> 
> Since Intel has to build it it I figured everyone would want to use it
> because it is probably going to be much faster than reading VMAs.

Ack.

> Especially in the modern world of MMIO BARs in the 512GB range.
> 
> > You'd essentially see a guest as a 'device' (probably with an
> > actual struct dev representing it), and the stage-2 MMU in front of it
> > as its IOMMU. That could potentially allow KVM to implement dma_map_ops
> > for that guest 'device' by mapping/unmapping pages into its stage-2 and
> > such. 
> 
> The plan isn't something so wild..

I'll take that as a compliment ;-)

Not dying on that hill, but it didn't feel _that_ horrible after
thinking about it for a little while. From the host's PoV, a guest is
just another thing that can address memory, which has its own address
space and a page-table that we control in front. If you squint hard
enough it doesn't look _that_ different from a device from that angle.
Oh well.

> https://github.com/jgunthorpe/linux/commits/dmabuf_map_type/
> 
> The "Physical Address List" mapping type will let KVM just get a
> normal phys_addr_t list and do its normal stuff with it. No need for
> hacky DMA API things.

Thanks, I'll read up.

> Probably what will be hard for KVM is that it gets the entire 512GB in
> one shot and will have to chop it up to install the whole thing into
> the PTE sizes available in the S2. I don't think it even has logic
> like that right now??

The closest thing I can think of is the KVM_PRE_FAULT_MEMORY stuff in
the KVM API that forces it to fault in an arbitrarily range of guest
IPA space. There should at least be bits of infrastructure that can be
re-used for that I guess.

> > It gets really funny when a CoCo guest decides to share back a subset of
> > that dmabuf with the host, and I'm still wrapping my head around how
> > we'd make that work, but at this point I'm ready to be told how all the
> > above already doesn't work and that I should go back to the peanut
> > gallery :-)
> 
> Oh, I don't actually know how that ends up working but I suppose it
> could be meaningfully done :\

For mobile/pKVM we'll want to use dmabufs for more than just passing
MMIO to guests FWIW, it'll likely be used for memory in certain cases
too. There are examples in the KVM Forum talk I linked in the previous
email, but being able to feed guests with dmabuf-backed memory regions
is very helpful. That's useful to e.g. get physically contiguous memory
allocated from a CMA-backed dmabuf heap on systems that don't tolerate
scattered private memory well for example (either for functional or
performance reasons). I certainly wish we could ignore this type of
hardware, but we don't have that luxury sadly.

In cases like that, we certainly expect that the guest will be sharing
back parts of memory it's been given (at least a swiotlb bounce buffer
so it can do virtio etc), and that may very well be in the middle of a
dmabuf-backed memslot. In fact the guest has no clue what is backing
it's memory region, so we can't really expect it _not_ to do that :/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ