[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170706094348.GF18106@cbox>
Date: Thu, 6 Jul 2017 11:43:48 +0200
From: Christoffer Dall <cdall@...aro.org>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Alexander Graf <agraf@...e.de>,
Suzuki K Poulose <Suzuki.Poulose@....com>,
kvmarm@...ts.cs.columbia.edu, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the
vm
Hi Andrea,
On Thu, Jul 06, 2017 at 11:31:26AM +0200, Andrea Arcangeli wrote:
> Hello,
>
> On Thu, Jul 06, 2017 at 09:45:13AM +0200, Christoffer Dall wrote:
> > Let's look at the callers to stage2_get_pmd, which is the only caller of
> > stage2_get_pud, where the problem was observed:
> >
> > user_mem_abort
> > -> stage2_set_pmd_huge
> > -> stage2_get_pmd
> >
> > user_mem_abort
> > -> stage2_set_pte
> > -> stage2_get_pmd
> >
> > handle_access_fault
> > -> stage2_get_pmd
> >
> > For the above three functions, pgd cannot ever be NULL, because this is
> > running in the context of a VCPU thread, which means the reference on
> > the VM fd must not reach zero, so no need to call that here.
>
> Just a minor nitpick: the !pgd bypass is necessary before the KVM fd
> technically reaches zero.
>
> exit_mm->mmput->exit_mmap() will invoke the __mmu_notifier_release
> even if the KVM fd isn't zero yet.
But is exit_mm possible when you have VCPU *threads* running in the VCPU
KVM_RUN ioctl ?
>
> This is because the secondary MMU page faults must be shutdown before
> freeing the guest RAM (nothing can call handle_mm_fault or any
> get_user_pages after mm->mm_users == 0), regardless if
> mmu_notifier_unregister hasn't been called yet (i.e. if the /dev/kvm
> fd is still open).
>
> Usually the fd is closed immediately after exit_mmap, as exit_files is
> called shortly after exit_mm() but there's a common window where the
> fd is still open but the !pgd check is already necessary (plus the fd
> could in theory be passed to other processes).
>
> > using the kvm->mmu_lock() and understanding that this only happens when
> > mmu notifiers call into the KVM MMU code outside the context of the VM.
>
> Agreed.
>
> The other arches don't need any special check to serialize against
> kvm_mmu_notifier_release, they're just looking up shadow pagetables
> through spte rmap (and they'll find nothing if
> kvm_mmu_notifier_release already run).
>
> In theory it would make more sense to put the overhead in the slow
> path by adding a mutex to the mmu_notifier struct and then using that
> to solve the race between mmu_notifier_release and
> mmu_notifier_unregister, and then to hlist_del_init to unhash the mmu
> notifier and then to call synchronize_srcu, before calling ->release
> while holding some mutex. However that's going to be marginally slower
> for the other arches.
>
> In practice I doubt this is measurable and getting away with one less
> mutex in mmu notifier_release vs mmu_notifier_unregister sounds
> simpler but comments welcome...
>
I think just checking the pgd pointer under the mmu_lock is completely
fine for arm/arm64, as long as we understand what's going on.
Thanks,
-Christoffer
Powered by blists - more mailing lists