[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <86f36cfd-65b6-4b71-9272-d17e5e41b57c@arm.com>
Date: Fri, 29 Nov 2024 14:55:36 +0000
From: Steven Price <steven.price@....com>
To: Suzuki K Poulose <suzuki.poulose@....com>, kvm@...r.kernel.org,
kvmarm@...ts.linux.dev
Cc: Catalin Marinas <catalin.marinas@....com>, Marc Zyngier <maz@...nel.org>,
Will Deacon <will@...nel.org>, James Morse <james.morse@....com>,
Oliver Upton <oliver.upton@...ux.dev>, Zenghui Yu <yuzenghui@...wei.com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Joey Gouly <joey.gouly@....com>, Alexandru Elisei
<alexandru.elisei@....com>, Christoffer Dall <christoffer.dall@....com>,
Fuad Tabba <tabba@...gle.com>, linux-coco@...ts.linux.dev,
Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>,
Gavin Shan <gshan@...hat.com>, Shanker Donthineni <sdonthineni@...dia.com>,
Alper Gun <alpergun@...gle.com>, "Aneesh Kumar K . V"
<aneesh.kumar@...nel.org>
Subject: Re: [PATCH v5 18/43] arm64: RME: Handle realm enter/exit
On 29/11/2024 13:45, Suzuki K Poulose wrote:
> Hi Steven
>
> On 29/11/2024 12:18, Steven Price wrote:
>> Hi Suzuki,
>>
>> Sorry for the very slow response to this. Coming back to this I'm having
>> doubts, see below.
>>
>> On 17/10/2024 14:00, Suzuki K Poulose wrote:
>>> On 04/10/2024 16:27, Steven Price wrote:
>>>> Entering a realm is done using a SMC call to the RMM. On exit the
>>>> exit-codes need to be handled slightly differently to the normal KVM
>>>> path so define our own functions for realm enter/exit and hook them
>>>> in if the guest is a realm guest.
>>>>
>>>> Signed-off-by: Steven Price <steven.price@....com>
>> ...
>>>> diff --git a/arch/arm64/kvm/rme-exit.c b/arch/arm64/kvm/rme-exit.c
>>>> new file mode 100644
>>>> index 000000000000..e96ea308212c
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/rme-exit.c
>> ...
>>>> +static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + struct kvm *kvm = vcpu->kvm;
>>>> + struct realm *realm = &kvm->arch.realm;
>>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>>> + unsigned long base = rec->run->exit.ripas_base;
>>>> + unsigned long top = rec->run->exit.ripas_top;
>>>> + unsigned long ripas = rec->run->exit.ripas_value;
>>>> + unsigned long top_ipa;
>>>> + int ret;
>>>> +
>>>> + if (!realm_is_addr_protected(realm, base) ||
>>>> + !realm_is_addr_protected(realm, top - 1)) {
>>>> + kvm_err("Invalid RIPAS_CHANGE for %#lx - %#lx, ripas: %#lx\n",
>>>> + base, top, ripas);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
>>>> + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
>>>
>>> I think we also need to filter the request for RIPAS_RAM, by consulting
>>> if the "range" is backed by a memslot or not. If they are not, we should
>>> reject the request with a response flag set in run.enter.flags.
>>
>> It's an interesting API question. At the moment there is no requirement
>> to have an active memslot to set the RIPAS - this is true both during
>> the setup by the VMM and at run time.
>>
>> In theory a VMM can create/destroy memslots while the guest is running.
>> So absense of a memslot doesn't actually imply that the RIPAS change
>
> Agreed. Whether an IPA range may be used as RAM is a decision that the
> VMM must make. So, we could give the VMM a chance to respond to this
> request before we (KVM) make the RTT changes.
>
>> should be rejected. Obviously with realms this is tricky because when
>> destroying a memslot that's in use KVM would rip those pages out from
>> the guest and it would require guest cooperation to restore those pages
>> (transition to RIPAS_EMPTY and back to RIPAS_RAM). But it's not
>> something that has been prohibited so far.
>
> True, and it shouldn't be prohibited. If the Host wants to take away a
> memslot it must be able to do that. But if it wants to do that in
> good faith with the Realm, there must have been some communication
> (e.g., virtio-mem ?) between the Host and the Realm and as long as the
> Realm knows not to trust the contents on that region it could be
> recovered without a transition to EMPTY.
>
> e.g. From RIPAS_DESTROYED => RIPAS_RAM with RSI_SET_IPA_STATE(...
> CHANGE_DESTROYED).
Indeed - I always forget RSI_SET_IPA_STATE has two modes these days.
>>
>> On the other hand this is a clear way for a (malicious/buggy) guest to
>> use a fair bit of RAM by transitioning to RIPAS_RAM (sparse) pages not
>> in a memslot and forcing KVM to allocate the RTT pages to delegate to
>> the RMM. But we do exit to the VMM, so this is solvable in the VMM (by
>> killing a misbehaving guest). The number of pages this would consume per
>> exit is also fairly small.
>
> Correct. If the VMM has no intention to provide memory at a given IPA
> range, KVM shouldn't report RSI_ACCEPT to the Realm and the Realm later
> gets a stage2 fault that cannot be serviced by KVM.
>
>>
>> So my instinct is that we shouldn't impose that requirement.
>
> I think we may be able to fix this by letting the VMM ACCEPT or REJECT
> a given RIPAS_RAM transition request. That way, KVM isn't playing by
> the rules set by the VMM and whether the VMM wants to trick the Realm
> or play by the rules is upto it.
Sounds good to me.
>>
>> Any thoughts?
>>
>>> As for EMPTY requests, if the guest wants to explicitly mark any range
>>> as EMPTY, it doesn't matter, as long as it is within the protected IPA.
>>> (even though they may be EMPTY in the first place).
>>>
>>>> + write_lock(&kvm->mmu_lock);
>>>> + ret = realm_set_ipa_state(vcpu, base, top, ripas, &top_ipa);
>>>> + write_unlock(&kvm->mmu_lock);
>>>> +
>>>> + WARN(ret && ret != -ENOMEM,
>>>> + "Unable to satisfy RIPAS_CHANGE for %#lx - %#lx, ripas:
>>>> %#lx\n",
>>>> + base, top, ripas);
>>>> +
>>>> + /* Exit to VMM to complete the change */
>>>> + kvm_prepare_memory_fault_exit(vcpu, base, top_ipa - base, false,
>>>> false,
>>>> + ripas == RMI_RAM);
>>>
>>> Again this may only be need if the range is backed by a memslot ?
>>> Otherwise the VMM has nothing to do.
>>
>> Assuming the above, then the VMM would be the one to kill a misbehaving
>> guest, so would need a notification.
>
> May be we could reverse the order of operations by delaying the
> realm_set_ipa_state() to occur on VMMs request from the memory_fault_exit.
Ah, good point - moving the RIPAS state set to the entry path makes a
lot of sense. The only negative is that we push the loop handling
partial RIPAS changes into the KVM entry path - but I don't think that's
a major problem.
Thanks,
Steve
>
> Suzuki
>
>>
>> Thanks,
>> Steve
>>
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>>> +
>>>> + __vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = rec->run->exit.cntv_ctl;
>>>> + __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = rec->run->exit.cntv_cval;
>>>> + __vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = rec->run->exit.cntp_ctl;
>>>> + __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = rec->run->exit.cntp_cval;
>>>> +
>>>> + kvm_realm_timers_update(vcpu);
>>>> +}
>>>> +
>>>> +/*
>>>> + * Return > 0 to return to guest, < 0 on error, 0 (and set
>>>> exit_reason) on
>>>> + * proper exit to userspace.
>>>> + */
>>>> +int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
>>>> +{
>>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>>> + u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr);
>>>> + unsigned long status, index;
>>>> +
>>>> + status = RMI_RETURN_STATUS(rec_run_ret);
>>>> + index = RMI_RETURN_INDEX(rec_run_ret);
>>>> +
>>>> + /*
>>>> + * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we
>>>> might
>>>> + * see the following status code and index indicating an attempt
>>>> to run
>>>> + * a REC when the RD state is SYSTEM_OFF. In this case, we just
>>>> need to
>>>> + * return to user space which can deal with the system event or
>>>> will try
>>>> + * to run the KVM VCPU again, at which point we will no longer
>>>> attempt
>>>> + * to enter the Realm because we will have a sleep request
>>>> pending on
>>>> + * the VCPU as a result of KVM's PSCI handling.
>>>> + */
>>>> + if (status == RMI_ERROR_REALM && index == 1) {
>>>> + vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
>>>> + return 0;
>>>> + }
>>>> +
>>>> + if (rec_run_ret)
>>>> + return -ENXIO;
>>>> +
>>>> + vcpu->arch.fault.esr_el2 = rec->run->exit.esr;
>>>> + vcpu->arch.fault.far_el2 = rec->run->exit.far;
>>>> + vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar;
>>>> +
>>>> + update_arch_timer_irq_lines(vcpu);
>>>> +
>>>> + /* Reset the emulation flags for the next run of the REC */
>>>> + rec->run->enter.flags = 0;
>>>> +
>>>> + switch (rec->run->exit.exit_reason) {
>>>> + case RMI_EXIT_SYNC:
>>>> + return rec_exit_handlers[esr_ec](vcpu);
>>>> + case RMI_EXIT_IRQ:
>>>> + case RMI_EXIT_FIQ:
>>>> + return 1;
>>>> + case RMI_EXIT_PSCI:
>>>> + return rec_exit_psci(vcpu);
>>>> + case RMI_EXIT_RIPAS_CHANGE:
>>>> + return rec_exit_ripas_change(vcpu);
>>>> + }
>>>> +
>>>> + kvm_pr_unimpl("Unsupported exit reason: %u\n",
>>>> + rec->run->exit.exit_reason);
>>>> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>>> + return 0;
>>>> +}
>>>> diff --git a/arch/arm64/kvm/rme.c b/arch/arm64/kvm/rme.c
>>>> index 1fa9991d708b..4c0751231810 100644
>>>> --- a/arch/arm64/kvm/rme.c
>>>> +++ b/arch/arm64/kvm/rme.c
>>>> @@ -899,6 +899,25 @@ void kvm_destroy_realm(struct kvm *kvm)
>>>> kvm_free_stage2_pgd(&kvm->arch.mmu);
>>>> }
>>>> +int kvm_rec_enter(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + struct realm_rec *rec = &vcpu->arch.rec;
>>>> +
>>>> + switch (rec->run->exit.exit_reason) {
>>>> + case RMI_EXIT_HOST_CALL:
>>>> + case RMI_EXIT_PSCI:
>>>> + for (int i = 0; i < REC_RUN_GPRS; i++)
>>>> + rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
>>>> + break;
>>>> + }
>>>
>>> As mentioned in the patch following (MMIO emulation support), we may be
>>> able to do this unconditionally for all REC entries, to cover ourselves
>>> from missing out other cases. The RMM is in charge of taking the
>>> appropriate action anyways to copy the results back.
>>>
>>> Suzuki
>>>
>>>> +
>>>> + if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE)
>>>> + return -EINVAL;
>>>> +
>>>> + return rmi_rec_enter(virt_to_phys(rec->rec_page),
>>>> + virt_to_phys(rec->run));
>>>> +}
>>>> +
>>>> static void free_rec_aux(struct page **aux_pages,
>>>> unsigned int num_aux)
>>>> {
>>
>
Powered by blists - more mailing lists