[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7affba9-0cea-4493-b868-392158b59d83@paulmck-laptop>
Date: Mon, 14 Jul 2025 17:21:47 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Chao Gao <chao.gao@...el.com>
Cc: linux-coco@...ts.linux.dev, x86@...nel.org, kvm@...r.kernel.org,
seanjc@...gle.com, pbonzini@...hat.com, eddie.dong@...el.com,
kirill.shutemov@...el.com, dave.hansen@...el.com,
dan.j.williams@...el.com, kai.huang@...el.com,
isaku.yamahata@...el.com, elena.reshetova@...el.com,
rick.p.edgecombe@...el.com, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC PATCH 00/20] TD-Preserving updates
On Fri, Jul 11, 2025 at 04:04:48PM +0800, Chao Gao wrote:
> On Fri, May 23, 2025 at 02:52:23AM -0700, Chao Gao wrote:
> >Hi Reviewers,
> >
> >This series adds support for runtime TDX module updates that preserve
> >running TDX guests (a.k.a, TD-Preserving updates). The goal is to gather
> >feedback on the feature design. Please pay attention to the following items:
> >
> >1. TD-Preserving updates are done in stop_machine() context. it copy-pastes
> > part of multi_cpu_stop() to guarantee step-locked progress on all CPUs.
> > But, there are a few differences between them. I am wondering whether
> > these differences have reached a point where abstracting a common
> > function might do more harm than good. See more details in patch 10.
Please note that multi_cpu_stop() is used by a number of functions,
so it is a good example of common code. But you are within your rights
to create your own function to pass to stop_machine(), and quite a
few call sites do just that. Most of them expect this function to be
executed on only one CPU, but these run on multiple CPUs:
o __apply_alternatives_multi_stop(), which has CPU 0 do the
work and the rest wati on it.
o cpu_enable_non_boot_scope_capabilities(), which works on
a per-CPU basis.
o do_join(), which is similar to your do_seamldr_install_module().
Somewhat similar, anyway.
o __ftrace_modify_code(), of which there are several, some of
which have some vague resemblance to your code.
o cache_rendezvous_handler(), which works on a per-CPU basis.
o panic_stop_irqoff_fn(), which is a simple barrier-wait, with
the last CPU to arrive doing the work.
I strongly recommend looking at these functions. They might
suggest an improved way to do what you are trying to accomplish with
do_seamldr_install_module().
> >2. P-SEAMLDR seamcalls (specificially SEAMRET from P-SEAMLDR) clear current
> > VMCS pointers, which may disrupt KVM. To prevent VMX instructions in IRQ
> > context from encountering NULL current-VMCS pointers, P-SEAMLDR
> > seamcalls are called with IRQ disabled. I'm uncertain if NMIs could
> > cause a problem, but I believe they won't. See more information in patch 3.
> >
> >3. Two helpers, cpu_vmcs_load() and cpu_vmcs_store(), are added in patch 3
> > to save and restore the current VMCS. KVM has a variant of cpu_vmcs_load(),
> > i.e., vmcs_load(). Extracting KVM's version would cause a lot of code
> > churn, and I don't think that can be justified for reducing ~16 LoC
> > duplication. Please let me know if you disagree.
>
> Gentle ping!
I do not believe that I was CCed on the original. Just in case you
were wondering why I did not respond. ;-)
> There are three open issues: one regarding stop_machine() and two related to
> interactions with KVM.
>
> Sean and Paul, do you have any preferences or insights on these matters?
Again, you are within your rights to create a new function and pass
it to stop_machine(). But it seems quite likely that there is a much
simpler way to get your job done.
Either way, please add a header comment stating what your function
is trying to do, which appears to be to wait for all CPUs to enter
do_seamldr_install_module() and then just leave? Sort of like
multi_cpu_stop(), except leaving interrupts enabled and not executing a
"msdata->fn(msdata->data);", correct?
If so, something like panic_stop_irqoff_fn() might be a simpler model,
perhaps with the touch_nmi_watchdog() and rcu_momentary_eqs() added.
Oh, and one bug: You must have interrupts disabled when you call
rcu_momentary_eqs(). Please fix this.
Thanx, Paul
Powered by blists - more mailing lists