[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzwm803xa4.fsf@ackerleytng-ctop.c.googlers.com>
Date: Tue, 22 Jul 2025 11:17:39 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: Xu Yilun <yilun.xu@...ux.intel.com>, Ira Weiny <ira.weiny@...el.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, Vishal Annapurve <vannapurve@...gle.com>,
Jason Gunthorpe <jgg@...pe.ca>, Alexey Kardashevskiy <aik@....com>, Fuad Tabba <tabba@...gle.com>, kvm@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, x86@...nel.org,
linux-fsdevel@...r.kernel.org, ajones@...tanamicro.com,
akpm@...ux-foundation.org, amoorthy@...gle.com, anthony.yznaga@...cle.com,
anup@...infault.org, aou@...s.berkeley.edu, bfoster@...hat.com,
binbin.wu@...ux.intel.com, brauner@...nel.org, catalin.marinas@....com,
chao.p.peng@...el.com, chenhuacai@...nel.org, dave.hansen@...el.com,
david@...hat.com, dmatlack@...gle.com, dwmw@...zon.co.uk,
erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com, graf@...zon.com,
haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com,
isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com,
jarkko@...nel.org, jgowans@...zon.com, jhubbard@...dia.com, jroedel@...e.de,
jthoughton@...gle.com, jun.miao@...el.com, kai.huang@...el.com,
keirf@...gle.com, kent.overstreet@...ux.dev, kirill.shutemov@...el.com,
liam.merwick@...cle.com, maciej.wieczor-retman@...el.com,
mail@...iej.szmigiero.name, maz@...nel.org, mic@...ikod.net,
michael.roth@....com, mpe@...erman.id.au, muchun.song@...ux.dev,
nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev, palmer@...belt.com,
pankaj.gupta@....com, paul.walmsley@...ive.com, pbonzini@...hat.com,
pdurrant@...zon.co.uk, peterx@...hat.com, pgonda@...gle.com, pvorel@...e.cz,
qperret@...gle.com, quic_cvanscha@...cinc.com, quic_eberman@...cinc.com,
quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com, quic_pheragu@...cinc.com,
quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com, richard.weiyang@...il.com,
rick.p.edgecombe@...el.com, rientjes@...gle.com, roypat@...zon.co.uk,
rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org, steven.price@....com,
steven.sistare@...cle.com, suzuki.poulose@....com, thomas.lendacky@....com,
usama.arif@...edance.com, vbabka@...e.cz, viro@...iv.linux.org.uk,
vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org,
willy@...radead.org, xiaoyao.li@...el.com, yilun.xu@...el.com,
yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls
Xu Yilun <yilun.xu@...ux.intel.com> writes:
>> > > >> Yan, Yilun, would it work if, on conversion,
>> > > >>
>> > > >> 1. guest_memfd notifies IOMMU that a conversion is about to happen for a
>> > > >> PFN range
>> > > >
>> > > > It is the Guest fw call to release the pinning.
>> > >
>> > > I see, thanks for explaining.
>> > >
>> > > > By the time VMM get the
>> > > > conversion requirement, the page is already physically unpinned. So I
>> > > > agree with Jason the pinning doesn't have to reach to iommu from SW POV.
>> > > >
>> > >
>> > > If by the time KVM gets the conversion request, the page is unpinned,
>> > > then we're all good, right?
>> >
>> > Yes, unless guest doesn't unpin the page first by mistake.
>>
>> Or maliciously? :-(
>
> Yes.
>
>>
>> My initial response to this was that this is a bug and we don't need to be
>> concerned with it. However, can't this be a DOS from one TD to crash the
>> system if the host uses the private page for something else and the
>> machine #MC's?
>
> I think we are already doing something to prevent vcpus from executing
> then destroy VM, so no further TD accessing. But I assume there is
> concern a TD could just leak a lot of resources, and we are
> investigating if host can reclaim them.
>
> Thanks,
> Yilun
Sounds like a malicious guest could skip unpinning private memory, and
guest_memfd's unmap will fail, leading to a KVM_BUG_ON() as Yan/Rick
suggested here [1].
Actually it seems like a legacy guest would also lead to unmap failures
and the KVM_BUG_ON(), since when TDX connect is enabled, the pinning
mode is enforced, even for non-IO private pages?
I hope your team's investigations find a good way for the host to
reclaim memory, at least from dead TDs! Otherwise this would be an open
hole for guests to leak a host's memory.
Circling back to the original topic [2], it sounds like we're okay for
IOMMU to *not* take any refcounts on pages and can rely on guest_memfd
to keep the page around on behalf of the VM?
[1] https://lore.kernel.org/all/diqzcya13x2j.fsf@ackerleytng-ctop.c.googlers.com/
[2] https://lore.kernel.org/all/CAGtprH_qh8sEY3s-JucW3n1Wvoq7jdVZDDokvG5HzPf0HV2=pg@mail.gmail.com/
Powered by blists - more mailing lists