[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b8a939172395bf38e581634abecf925@kernel.org>
Date: Mon, 25 May 2020 16:44:53 +0100
From: Marc Zyngier <maz@...nel.org>
To: Keqian Zhu <zhukeqian1@...wei.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.cs.columbia.edu, kvm@...r.kernel.org,
Catalin Marinas <catalin.marinas@....com>,
James Morse <james.morse@....com>,
Will Deacon <will@...nel.org>,
Suzuki K Poulose <suzuki.poulose@....com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
Julien Thierry <julien.thierry.kdev@...il.com>,
Mark Brown <broonie@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Alexios Zavras <alexios.zavras@...el.com>,
wanghaibin.wang@...wei.com, zhengxiang9@...wei.com
Subject: Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM
On 2020-05-25 12:23, Keqian Zhu wrote:
> This patch series add support for stage2 hardware DBM, and it is only
> used for dirty log for now.
>
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
>
> However, there are some known issues not solved.
>
> 1. Some mechanisms that rely on "write permission fault" become
> invalid,
> such as kvm_set_pfn_dirty and "mmap page sharing".
>
> kvm_set_pfn_dirty is called in user_mem_abort when guest issues
> write
> fault. This guarantees physical page will not be dropped directly
> when
> host kernel recycle memory. After using hardware dirty management,
> we
> have no chance to call kvm_set_pfn_dirty.
Then you will end-up with memory corruption under memory pressure.
This also breaks things like CoW, which we depend on.
>
> For "mmap page sharing" mechanism, host kernel will allocate a new
> physical page when guest writes a page that is shared with other
> page
> table entries. After using hardware dirty management, we have no
> chance
> to do this too.
>
> I need to do some survey on how stage1 hardware DBM solve these
> problems.
> It helps if anyone can figure it out.
>
> 2. Page Table Modification Races: Though I have found and solved some
> data
> races when kernel changes page table entries, I still doubt that
> there
> are data races I am not aware of. It's great if anyone can figure
> them out.
>
> 3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
> consumes about 40ms to traverse all PTEs to collect dirty log. It
> will
> cause unbearable downtime for migration if memory size is too big. I
> will
> try to solve this problem in Patch v1.
This, in my opinion, is why Stage-2 DBM is fairly useless.
From a performance perspective, this is the worse possible
situation. You end up continuously scanning page tables, at
an arbitrary rate, without a way to evaluate the fault rate.
One thing S2-DBM would be useful for is SVA, where a device
write would mark the S2 PTs dirty as they are shared between
CPU and SMMU. Another thing is SPE, which is essentially a DMA
agent using the CPU's PTs.
But on its own, and just to log the dirty pages, S2-DBM is
pretty rubbish. I wish arm64 had something like Intel's PML,
which looks far more interesting for the purpose of tracking
accesses.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists