[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e2ee8dd-4412-4133-8b08-75d64ab79649@os.amperecomputing.com>
Date: Wed, 6 Mar 2024 11:01:09 +0530
From: Ganapatrao Kulkarni <gankulkarni@...amperecomputing.com>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: kvmarm@...ts.linux.dev, kvm@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
maz@...nel.org, darren@...amperecomputing.com,
d.scott.phillips@...erecomputing.com, James Morse <james.morse@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU
tables.
On 05-03-2024 02:16 pm, Oliver Upton wrote:
> -cc old kvmarm list
> +cc new kvmarm list, reviewers
>
> Please run scripts/get_maintainer.pl next time around so we get the
> right people looking at a patch.
>
Of course I know this script -:)
I didn't cc since I felt to avoid unnecessary overloading someone's
inbox. I don't think anyone(even ARM) is interested in this feature
other than Marc and me/Ampere. Otherwise this would have merged upstream
by now.
BTW, NV feature development started way back in 2016/17.
> On Mon, Mar 04, 2024 at 09:46:06PM -0800, Ganapatrao Kulkarni wrote:
>> @@ -216,6 +223,13 @@ struct kvm_s2_mmu {
>> * >0: Somebody is actively using this.
>> */
>> atomic_t refcnt;
>> +
>> + /*
>> + * For a Canonical IPA to Shadow IPA mapping.
>> + */
>> + struct rb_root nested_mapipa_root;
>
> There isn't any benefit to tracking the canonical IPA -> shadow IPA(s)
> mapping on a per-S2 basis, as there already exists a one-to-many problem
> (more below). Maintaining a per-VM data structure (since this is keyed
> by canonical IPA) makes a bit more sense.
>
>> + rwlock_t mmu_lock;
>> +
>
> Err, is there any reason the existing mmu_lock is insufficient here?
> Surely taking a new reference on a canonical IPA for a shadow S2 must be
> done behind the MMU lock for it to be safe against MMU notifiers...
>
> Also, Reusing the exact same name for it is sure to produce some lock
> imbalance funnies.
>
>> };
>>
>> static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index da7ebd2f6e24..c31a59a1fdc6 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -65,6 +65,9 @@ extern void kvm_init_nested(struct kvm *kvm);
>> extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>> extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
>> extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
>> +extern void add_shadow_ipa_map_node(
>> + struct kvm_s2_mmu *mmu,
>> + phys_addr_t ipa, phys_addr_t shadow_ipa, long size);
>
> style nitpick: no newline between the open bracket and first parameter.
> Wrap as needed at 80 (or a bit more) columns.
>
>> +/*
>> + * Create a node and add to lookup table, when a page is mapped to
>> + * Canonical IPA and also mapped to Shadow IPA.
>> + */
>> +void add_shadow_ipa_map_node(struct kvm_s2_mmu *mmu,
>> + phys_addr_t ipa,
>> + phys_addr_t shadow_ipa, long size)
>> +{
>> + struct rb_root *ipa_root = &(mmu->nested_mapipa_root);
>> + struct rb_node **node = &(ipa_root->rb_node), *parent = NULL;
>> + struct mapipa_node *new;
>> +
>> + new = kzalloc(sizeof(struct mapipa_node), GFP_KERNEL);
>> + if (!new)
>> + return;
>
> Should be GFP_KERNEL_ACCOUNT, you want to charge this to the user.
>
>> +
>> + new->shadow_ipa = shadow_ipa;
>> + new->ipa = ipa;
>> + new->size = size;
>
> What about aliasing? You could have multiple shadow IPAs that point to
> the same canonical IPA, even within a single MMU.
>
>> + write_lock(&mmu->mmu_lock);
>> +
>> + while (*node) {
>> + struct mapipa_node *tmp;
>> +
>> + tmp = container_of(*node, struct mapipa_node, node);
>> + parent = *node;
>> + if (new->ipa < tmp->ipa) {
>> + node = &(*node)->rb_left;
>> + } else if (new->ipa > tmp->ipa) {
>> + node = &(*node)->rb_right;
>> + } else {
>> + write_unlock(&mmu->mmu_lock);
>> + kfree(new);
>> + return;
>> + }
>> + }
>> +
>> + rb_link_node(&new->node, parent, node);
>> + rb_insert_color(&new->node, ipa_root);
>> + write_unlock(&mmu->mmu_lock);
>
> Meh, one of the annoying things with rbtree is you have to build your
> own search functions...
>
> It would appear that the rbtree intends to express intervals (i.e. GPA +
> size), but the search implementation treats GPA as an index. So I don't
> think this works as intended.
>
> Have you considered other abstract data types (e.g. xarray, maple tree)
> and how they might apply here?
>
Thanks for suggesting the maple tree based lookup, I will try it in next
version.
>> +bool get_shadow_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa, phys_addr_t *shadow_ipa, long *size)
>> +{
>> + struct rb_node *node;
>> + struct mapipa_node *tmp = NULL;
>> +
>> + read_lock(&mmu->mmu_lock);
>> + node = mmu->nested_mapipa_root.rb_node;
>> +
>> + while (node) {
>> + tmp = container_of(node, struct mapipa_node, node);
>> +
>> + if (tmp->ipa == ipa)
>> + break;
>> + else if (ipa > tmp->ipa)
>> + node = node->rb_right;
>> + else
>> + node = node->rb_left;
>> + }
>> +
>> + read_unlock(&mmu->mmu_lock);
>> +
>> + if (tmp && tmp->ipa == ipa) {
>> + *shadow_ipa = tmp->shadow_ipa;
>> + *size = tmp->size;
>> + write_lock(&mmu->mmu_lock);
>> + rb_erase(&tmp->node, &mmu->nested_mapipa_root);
>> + write_unlock(&mmu->mmu_lock);
>> + kfree(tmp);
>> + return true;
>> + }
>
> Implicitly evicting the entry isn't going to work if we want to use it
> for updates to a stage-2 that do not evict the mapping, like write
> protection or access flag updates.
>
Thanks,
Ganapat
Powered by blists - more mailing lists