linux-kernel - Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2023082423-ninetieth-hamlet-54dc@gregkh>
Date:   Thu, 24 Aug 2023 08:53:40 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     stable@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq
 that hangs vCPUs

On Wed, Aug 23, 2023 at 06:01:04PM -0700, Sean Christopherson wrote:
> Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
> when checking to see if a page fault is stale, as the sequence count is
> stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
> where KVM will effectively hang vCPUs due to always thinking page faults
> are stale, which results in KVM refusing to "fix" faults.
> 
> mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
> KVM is handling page faults to detect if userspace mappings relevant to
> the guest were invalidated between snapshotting the counter and acquiring
> mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
> resolve the page fault is fresh.  If KVM sees that the counter has
> changed, KVM simply resumes the guest without fixing the fault.
> 
> What _should_ happen is that the source of the mmu_notifier invalidations
> eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
> again fix guest page fault(s).
> 
> But for a long-lived VM and/or a VM that the host just doesn't particularly
> like, it's possible for a VM to be on the receiving end of 2 billion (with
> a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
> mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
> negative value when implicitly cast to an "int" by is_page_fault_stale(),
> and then sign-extended into a 64-bit unsigned when the signed "int" is
> implicitly cast back to an "unsigned long" on the call to
> mmu_invalidate_retry_hva().
> 
> As a result of the casting and sign-extension, given a sequence counter of
> e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
> 
> 	if (0x8002dc25 != 0xffffffff8002dc25)
> 
> and signals that the page fault is stale and needs to be retried even
> though the sequence counter is stable, and KVM effectively hangs any vCPU
> that takes a page fault (EPT violation or #NPF when TDP is enabled).
> 
> Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
> in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
> how KVM tracks the sequence counter snapshot.
> 
> Reported-by: Brian Rak <brak@...tr.com>
> Reported-by: Amaan Cheval <amaan.cheval@...il.com>
> Reported-by: Eric Wheeler <kvm@...ts.ewheeler.net>
> Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
> Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>

What is the git commit id of this change in Linus's tree?

thanks,

greg k-h