[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170717182331.GA14069@cbox>
Date: Mon, 17 Jul 2017 20:23:31 +0200
From: Christoffer Dall <cdall@...aro.org>
To: Andrea Arcangeli <aarcange@...hat.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@....com>,
Alexander Graf <agraf@...e.de>,
"kvmarm@...ts.cs.columbia.edu" <kvmarm@...ts.cs.columbia.edu>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Stable <stable@...r.kernel.org>
Subject: Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the
vm
On Mon, Jul 17, 2017 at 05:16:17PM +0200, Andrea Arcangeli wrote:
> On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote:
> > I would also very much like to get to the bottom of this, and at the
> > very least try to get a valid explanation as to how a thread can be
> > *running* for a process where there are zero references to the struct
> > mm?
>
> A thread shouldn't be possibly be running if mm->mm_users is zero.
>
ok, good, then I don't have to re-take OS 101.
> > I guess I am asking where this mmput() can happen for a perfectly
> > running thread, which hasn't processes signals or exited itself yet.
>
> mmput runs during exit(), after that point the vcpu can't run the KVM
> ioctl anymore.
>
also very comforting that we agree on this.
> > The dump you reference above seems to indicate that it's happening
> > under memory pressure and trying to unmap memory from the VM to
> > allocate memory to the VM, but all seems to be happening within a VCPU
> > thread, or am I reading this wrong?
>
> In the oops the pgd was none while KVM vcpu ioctl was running, the
> most likely explanation is there were two VM running in parallel in
> the host, and the other one was quitting (mm_count of the other VM was
> zero, while mm_count of the VM that oopsed within the vcpu ioctl was >
> 0). The oops information itself can't tell if there was one or two VM
> running in the host so > 1 VM running is the most plausible
> explanation that doesn't break the above in invariants.
That's very keenly observed, and a really good explanation.
> It'd be nice
> if Alexander can confirm it, if he remembers about that specific setup
> after a couple of months since it happened.
My guess is that this was observed on the suse build machines with
arm64, and Alex ususally explains that these machines run *lots* of VMs
at the same time, so this sounds very likely.
Alex, can you confirm this was the type of workload?
>
> Even if there was just one VM running in the host, it would more
> likely mean something inside KVM ARM code is clearing the pgd before
> mm_users reaches zero, i.e. before the last mmput.
I don't think we have this.
>
> It's very unlikely mm_users could have been > 0 while the vcpu thread
> was running as many more things would fall apart in such case, not
> just the needed pgd check during mmu notifier post process exit.
>
That was my rationale exactly. Thanks for confirming!
-Christoffer
Powered by blists - more mailing lists