[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18729cf6-bf3a-4a11-a9fc-a35792cd1736@linux.intel.com>
Date: Sun, 28 Jul 2024 19:16:44 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>, Yan Zhao <yan.y.zhao@...el.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, sagis@...gle.com,
linux-kselftest@...r.kernel.org, afranji@...gle.com, erdemaktas@...gle.com,
isaku.yamahata@...el.com, pbonzini@...hat.com, shuah@...nel.org,
pgonda@...gle.com, haibo1.xu@...el.com, chao.p.peng@...ux.intel.com,
vannapurve@...gle.com, runanwang@...gle.com, vipinsh@...gle.com,
jmattson@...gle.com, dmatlack@...gle.com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH v5 09/29] KVM: selftests: TDX: Add report_fatal_error
test
On 4/23/2024 5:23 AM, Sean Christopherson wrote:
> On Thu, Apr 18, 2024, Yan Zhao wrote:
>> On Tue, Apr 16, 2024 at 11:50:19AM -0700, Sean Christopherson wrote:
>>> On Mon, Apr 15, 2024, Yan Zhao wrote:
>>>> On Mon, Apr 15, 2024 at 08:05:49AM +0000, Ackerley Tng wrote:
>>>>>>> The Intel GHCI Spec says in R12, bit 63 is set if the GPA is valid. As a
>>>>>> But above "__LINE__" is obviously not a valid GPA.
>>>>>>
>>>>>> Do you think it's better to check "data_gpa" is with shared bit on and
>>>>>> aligned in 4K before setting bit 63?
>>>>>>
>>>>> I read "valid" in the spec to mean that the value in R13 "should be
>>>>> considered as useful" or "should be passed on to the host VMM via the
>>>>> TDX module", and not so much as in "validated".
>>>>>
>>>>> We could validate the data_gpa as you suggested to check alignment and
>>>>> shared bit, but perhaps that could be a higher-level function that calls
>>>>> tdg_vp_vmcall_report_fatal_error()?
>>>>>
>>>>> If it helps, shall we rename "data_gpa" to "data" for this lower-level,
>>>>> generic helper function and remove these two lines
>>>>>
>>>>> if (data_gpa)
>>>>> error_code |= 0x8000000000000000;
>>>>>
>>>>> A higher-level function could perhaps do the validation as you suggested
>>>>> and then set bit 63.
>>>> This could be all right. But I'm not sure if it would be a burden for
>>>> higher-level function to set bit 63 which is of GHCI details.
>>>>
>>>> What about adding another "data_gpa_valid" parameter and then test
>>>> "data_gpa_valid" rather than test "data_gpa" to set bit 63?
>>> Who cares what the GHCI says about validity? The GHCI is a spec for getting
>>> random guests to play nice with random hosts. Selftests own both, and the goal
>>> of selftests is to test that KVM (and KVM's dependencies) adhere to their relevant
>>> specs. And more importantly, KVM is NOT inheriting the GHCI ABI verbatim[*].
>>>
>>> So except for the bits and bobs that *KVM* (or the TDX module) gets involved in,
>>> just ignore the GHCI (or even deliberately abuse it). To put it differently, use
>>> selftests verify *KVM's* ABI and functionality.
>>>
>>> As it pertains to this thread, while I haven't looked at any of this in detail,
>>> I'm guessing that whether or not bit 63 is set is a complete "don't care", i.e.
>>> KVM and the TDX Module should pass it through as-is.
>>>
>>> [*] https://lore.kernel.org/all/Zg18ul8Q4PGQMWam@google.com
>> Ok. It makes sense to KVM_EXIT_TDX.
>> But what if the TDVMCALL is handled in TDX specific code in kernel in future?
>> (not possible?)
> KVM will "handle" ReportFatalError, and will do so before this code lands[*], but
> I *highly* doubt KVM will ever do anything but forward the information to userspace,
> e.g. as KVM_SYSTEM_EVENT_CRASH with data[] filled in with the raw register information.
>
>> Should guest set bits correctly according to GHCI?
> No. Selftests exist first and foremost to verify KVM behavior, not to verify
> firmware behavior. We can and should use selftests to verify that *KVM* doesn't
> *violate* the GHCI, but that doesn't mean that selftests themselves can't ignore
> and/or abuse the GCHI, especially since the GHCI definition for ReportFatalError
> is frankly awful.
>
> E.g. the GHCI prescibes actual behavior for R13, but then doesn't say *anything*
> about what's in the data page. Why!?!?! If the format in the data page is
> completely undefined, what's the point of restricting R13 to only be allowed to
> hold a GPA?
The description of R13 in GHCI:
4KB-aligned GPA where additional error data is shared by the TD. The
VMM must validate that this GPA has the Shared bit set. In other words,
that a shared-mapping is used, and that this is a valid mapping for the
TD. This shared memory region is expected to hold a zero-terminated
string.
IIUC, according the GHCI, R13 is a 4K aligned shared buffer provided by
the TDX guest to pass additional error message to VMM, i.e., it needs to
be a shared GPA. And the content in the buffer is expected to hold a
zero-terminated string.
Do you think "a zero-terminated string" describes the format in the data
page?
>
> And the wording is just as awful:
>
> The VMM must validate that this GPA has the Shared bit set. In other words,
> that a shared-mapping is used, and that this is a valid mapping for the TD.
>
> I'm pretty sure it's just saying that the TDX module isn't going to verify the
> operate, i.e. that the VMM needs to protect itself, but it would be so much
> better to simply state "The TDX Module does not verify this GPA", because saying
> the VMM "must" do something leads to pointless discussions like this one, where
> we're debating over whether or *our* VMM should inject an error into *our* guest.
>
> Anyways, we should do what makes sense for selftests and ignore the stupidity of
> the GHCI when doing so yields better code. If that means abusing R13, go for it.
> If it's a sticking point for anyone, just use one of the "optional" registers.
>
> Whatever we do, bury the host and guest side of selftests behind #defines or helpers
> so that there are at most two pieces of code that care which register holds which
> piece of information.
>
> [*] https://lore.kernel.org/all/20240404230247.GU2444378@ls.amr.corp.intel.com
>
Powered by blists - more mailing lists