[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230705071438.GC462772@hirez.programming.kicks-ass.net>
Date: Wed, 5 Jul 2023 09:14:38 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
Cc: Dave Hansen <dave.hansen@...el.com>,
Sean Christopherson <seanjc@...gle.com>,
Isaku Yamahata <isaku.yamahata@...il.com>,
Kai Huang <kai.huang@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Ashok Raj <ashok.raj@...el.com>,
Tony Luck <tony.luck@...el.com>,
"david@...hat.com" <david@...hat.com>,
"bagasdotme@...il.com" <bagasdotme@...il.com>,
"ak@...ux.intel.com" <ak@...ux.intel.com>,
Rafael J Wysocki <rafael.j.wysocki@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Isaku Yamahata <isaku.yamahata@...el.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>,
"hpa@...or.com" <hpa@...or.com>, Sagi Shahar <sagis@...gle.com>,
"imammedo@...hat.com" <imammedo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>, Chao Gao <chao.gao@...el.com>,
Len Brown <len.brown@...el.com>,
"sathyanarayanan.kuppuswamy@...ux.intel.com"
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Ying Huang <ying.huang@...el.com>,
Dan J Williams <dan.j.williams@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v12 07/22] x86/virt/tdx: Add skeleton to enable TDX on
demand
On Mon, Jul 03, 2023 at 08:55:56PM +0300, kirill.shutemov@...ux.intel.com wrote:
> On Mon, Jul 03, 2023 at 05:03:30PM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 03, 2023 at 07:40:55AM -0700, Dave Hansen wrote:
> > > On 7/3/23 03:49, Peter Zijlstra wrote:
> > > >> There are also latency and noisy neighbor concerns, e.g. we *really* don't want
> > > >> to end up in a situation where creating a TDX guest for a customer can observe
> > > >> arbitrary latency *and* potentially be disruptive to VMs already running on the
> > > >> host.
> > > > Well, that's a quality of implementation issue with the whole TDX
> > > > crapola. Sounds like we want to impose latency constraints on the
> > > > various TDX calls. Allowing it to consume arbitrary amounts of CPU time
> > > > is unacceptable in any case.
> > >
> > > For what it's worth, everybody knew that calling into the TDX module was
> > > going to be a black hole and that consuming large amounts of CPU at
> > > random times would drive people bat guano crazy.
> > >
> > > The TDX Module ABI spec does have "Leaf Function Latency" warnings for
> > > some of the module calls. But, it's basically a binary thing. A call
> > > is either normal or "longer than most".
> > >
> > > The majority of the "longer than most" cases are for initialization.
> > > The _most_ obscene runtime ones are chunked up and can return partial
> > > progress to limit latency spikes. But I don't think folks tried as hard
> > > on the initialization calls since they're only called once which
> > > actually seems pretty reasonable to me.
> > >
> > > Maybe we need three classes of "Leaf Function Latency":
> > > 1. Sane
> > > 2. "Longer than most"
> > > 3. Better turn the NMI watchdog off before calling this. :)
> > >
> > > Would that help?
> >
> > I'm thikning we want something along the lines of the Xen preemptible
> > hypercalls, except less crazy. Where the caller does:
> >
> > for (;;) {
> > ret = tdcall(fn, args);
> > if (ret == -EAGAIN) {
> > cond_resched();
> > continue;
> > }
> > break;
> > }
> >
> > And then the TDX black box provides a guarantee that any one tdcall (or
> > seamcall or whatever) never takes more than X ns (possibly even
> > configurable) and we get to raise a bug report if we can prove it
> > actually takes longer.
>
> TDG.VP.VMCALL TDCALL can take arbitrary amount of time as it handles over
> control to the host/VMM.
>
> But I'm not quite follow how it is different from the host stopping
> scheduling vCPU on a random instruction. It can happen at any point and
> TDCALL is not special from this PoV.
A guest will exit on timer/interrupt and then the host can reschedule;
AFAIU this doesn't actually happen with these TDX calls, if control is
in that SEAM thing, it stays there until it's done.
Powered by blists - more mailing lists