[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <09d89355cdbbd19c456699774a9a980a@kernel.org>
Date: Fri, 22 Jan 2021 09:45:26 +0000
From: Marc Zyngier <maz@...nel.org>
To: Keqian Zhu <zhukeqian1@...wei.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
kvm@...r.kernel.org, kvmarm@...ts.cs.columbia.edu,
Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Mark Rutland <mark.rutland@....com>,
James Morse <james.morse@....com>,
Robin Murphy <robin.murphy@....com>,
Joerg Roedel <joro@...tes.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Suzuki K Poulose <suzuki.poulose@....com>,
Julien Thierry <julien.thierry.kdev@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexios Zavras <alexios.zavras@...el.com>,
wanghaibin.wang@...wei.com, jiangkunkun@...wei.com
Subject: Re: [RFC PATCH] kvm: arm64: Try stage2 block mapping for host device
MMIO
On 2021-01-22 08:36, Keqian Zhu wrote:
> The MMIO region of a device maybe huge (GB level), try to use block
> mapping in stage2 to speedup both map and unmap.
>
> Especially for unmap, it performs TLBI right after each invalidation
> of PTE. If all mapping is of PAGE_SIZE, it takes much time to handle
> GB level range.
This is only on VM teardown, right? Or do you unmap the device more
ofet?
Can you please quantify the speedup and the conditions this occurs in?
I have the feeling that we are just circling around another problem,
which is that we could rely on a VM-wide TLBI when tearing down the
guest. I worked on something like that[1] a long while ago, and parked
it for some reason. Maybe it is worth reviving.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/elide-cmo-tlbi
>
> Signed-off-by: Keqian Zhu <zhukeqian1@...wei.com>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 11 +++++++++++
> arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++++
> arch/arm64/kvm/mmu.c | 12 ++++++++----
> 3 files changed, 34 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h
> b/arch/arm64/include/asm/kvm_pgtable.h
> index 52ab38db04c7..2266ac45f10c 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -82,6 +82,17 @@ struct kvm_pgtable_walker {
> const enum kvm_pgtable_walk_flags flags;
> };
>
> +/**
> + * kvm_supported_pgsize() - Get the max supported page size of a
> mapping.
> + * @pgt: Initialised page-table structure.
> + * @addr: Virtual address at which to place the mapping.
> + * @end: End virtual address of the mapping.
> + * @phys: Physical address of the memory to map.
> + *
> + * The smallest return value is PAGE_SIZE.
> + */
> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end,
> u64 phys);
> +
> /**
> * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1
> page-table.
> * @pgt: Uninitialised page-table structure to initialise.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c
> b/arch/arm64/kvm/hyp/pgtable.c
> index bdf8e55ed308..ab11609b9b13 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -81,6 +81,21 @@ static bool kvm_block_mapping_supported(u64 addr,
> u64 end, u64 phys, u32 level)
> return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
> }
>
> +u64 kvm_supported_pgsize(struct kvm_pgtable *pgt, u64 addr, u64 end,
> u64 phys)
> +{
> + u32 lvl;
> + u64 pgsize = PAGE_SIZE;
> +
> + for (lvl = pgt->start_level; lvl < KVM_PGTABLE_MAX_LEVELS; lvl++) {
> + if (kvm_block_mapping_supported(addr, end, phys, lvl)) {
> + pgsize = kvm_granule_size(lvl);
> + break;
> + }
> + }
> +
> + return pgsize;
> +}
> +
> static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32
> level)
> {
> u64 shift = kvm_granule_shift(level);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7d2257cc5438..80b403fc8e64 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -499,7 +499,8 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> phys_addr_t pa, unsigned long size, bool writable)
> {
> - phys_addr_t addr;
> + phys_addr_t addr, end;
> + unsigned long pgsize;
> int ret = 0;
> struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> @@ -509,21 +510,24 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
> phys_addr_t guest_ipa,
>
> size += offset_in_page(guest_ipa);
> guest_ipa &= PAGE_MASK;
> + end = guest_ipa + size;
>
> - for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
> + for (addr = guest_ipa; addr < end; addr += pgsize) {
> ret = kvm_mmu_topup_memory_cache(&cache,
> kvm_mmu_cache_min_pages(kvm));
> if (ret)
> break;
>
> + pgsize = kvm_supported_pgsize(pgt, addr, end, pa);
> +
> spin_lock(&kvm->mmu_lock);
> - ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
> + ret = kvm_pgtable_stage2_map(pgt, addr, pgsize, pa, prot,
> &cache);
> spin_unlock(&kvm->mmu_lock);
> if (ret)
> break;
>
> - pa += PAGE_SIZE;
> + pa += pgsize;
> }
>
> kvm_mmu_free_memory_cache(&cache);
This otherwise looks neat enough.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists