[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bfe488aedf5e9c43b2578bbdcbf281cb60c5db41.camel@intel.com>
Date: Wed, 2 Jul 2025 15:47:28 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "pbonzini@...hat.com" <pbonzini@...hat.com>, "seanjc@...gle.com"
<seanjc@...gle.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "Shutemov, Kirill" <kirill.shutemov@...el.com>, "quic_eberman@...cinc.com"
<quic_eberman@...cinc.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen, Dave"
<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com"
<tabba@...gle.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>, "Du, Fan"
<fan.du@...el.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "michael.roth@....com"
<michael.roth@....com>, "Weiny, Ira" <ira.weiny@...el.com>, "vbabka@...e.cz"
<vbabka@...e.cz>, "binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
"Annapurve, Vishal" <vannapurve@...gle.com>, "jroedel@...e.de"
<jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>, "pgonda@...gle.com"
<pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 15/21] KVM: TDX: Support huge page splitting with
exclusive kvm->mmu_lock
On Thu, 2025-04-24 at 11:08 +0800, Yan Zhao wrote:
> +static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,
> + enum pg_level level, struct page *page)
> +{
> + int tdx_level = pg_level_to_tdx_sept_level(level);
> + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> + gpa_t gpa = gfn_to_gpa(gfn);
> + u64 err, entry, level_state;
> +
> + do {
> + err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> + &entry, &level_state);
> + } while (err == TDX_INTERRUPTED_RESTARTABLE);
> +
> + if (unlikely(tdx_operand_busy(err))) {
> + tdx_no_vcpus_enter_start(kvm);
> + err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> + &entry, &level_state);
> + tdx_no_vcpus_enter_stop(kvm);
> + }
> +
> + if (KVM_BUG_ON(err, kvm)) {
> + pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
> + return -EIO;
> + }
> + return 0;
> +}
> +
> +int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> + void *private_spt)
> +{
> + struct page *page = virt_to_page(private_spt);
> + int ret;
> +
> + if (KVM_BUG_ON(to_kvm_tdx(kvm)->state != TD_STATE_RUNNABLE || level != PG_LEVEL_2M, kvm))
> + return -EINVAL;
> +
> + ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
> + if (ret <= 0)
> + return ret;
> +
> + tdx_track(kvm);
> +
> + return tdx_spte_demote_private_spte(kvm, gfn, level, page);
> +}
The latest TDX docs talk about a feature called NON_BLOCKING_RESIZE. It allows
for demote without blocking. If we rely on this feature we could simplify this
code. Not having transitory blocked state would reduce the scenarios that have
to be accounted for. We could also make demote operation accommodate failures
(rollback on SEAMCALL BUSY issue), which means mmu write lock is no longer
needed. It would have helped the fault path demote issue, which we have now
worked around. But still, it seems more flexible as well as simpler.
What about we rely on it this feature for KVM TDX huge mappings?
Powered by blists - more mailing lists