linux-kernel - Re: [PATCH] kvm: x86/mmu: Simplify pte_list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZB38Y2HpvD+aD9fm@google.com>
Date:   Fri, 24 Mar 2023 12:39:15 -0700
From:   Sean Christopherson <seanjc@...gle.com>
To:     Mingwei Zhang <mizhang@...gle.com>
Cc:     linux-kernel@...r.kernel.org,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Lai Jiangshan <jiangshan.ljs@...group.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, kvm@...r.kernel.org
Subject: Re: [PATCH] kvm: x86/mmu: Simplify pte_list_{add|remove}

On Fri, Mar 24, 2023, Mingwei Zhang wrote:
> On Thu, Mar 23, 2023 at 3:51 PM Sean Christopherson <seanjc@...gle.com> wrote:
> >
> > On Fri, 13 Jan 2023 20:29:10 +0800, Lai Jiangshan wrote:
> > > Simplify pte_list_{add|remove} by ensuring all the non-head pte_list_desc
> > > to be full and addition/removal actions being performed on the head.
> > >
> > > To make pte_list_add() return a count as before, @tail_count is also
> > > added to the struct pte_list_desc.
> > >
> > > No visible performace is changed in tests.  But pte_list_add() is no longer
> > > shown in the perf result for the COWed pages even the guest forks millions
> > > of tasks.
> > >
> > > [...]
> >
> > Applied to kvm-x86 mmu, thanks!  I added quite a few comments and a BUG_ON() to
> > sanity check that the head is never empty when trying to remove an entry, but I
> > didn't make anything changes to the code itself.
> >
> > [1/1] kvm: x86/mmu: Simplify pte_list_{add|remove}
> >       https://github.com/kvm-x86/linux/commit/141705b78381
> >
> 
> I am not sure if it is possible, but now spte_count is u32 so does
> tail_count. I wonder if an attacker could use the potential integer
> overflow to trigger this? E.g,: creating a huge number of little L1
> EPTs with the many nGPA-> one GPA? hmm, I think it could overflow
> tail_count? Please double check.

Heh, I had the same reaction and even started reworking the patch, but convinced
myself that there are no new issues as __rmap_add() artificially limits the number
of rmap entries to 1000.  But looking again, mmu_page_add_parent_pte() doesn't
have the same safeguard, so in theory, a VM could overflow tail_count.

I'll stare at this some more next week, I'd really prefer to avoid taking on any
amount of complexity in KVM to handle something that should never occur in
practice.  But being buried deep in __link_shadow_page() makes the error handling
annoying.

> spte_count is u32, but assigned to an (signed) int j and BUG_ON(j <
> 0)? Please don't add more BUG_ON in KVM mmu... and please change
> either 'spte_count' to 'int' or 'j' to u32.

Why?  signed vs. unsigned doesn't change anything.  spte_count is can't be greater
than PTE_LIST_EXT, so overflow is not a concern.  The BUG_ON() is purely to ensure
there's no underflow, i.e. that spte_count > 0, so that the array can be safely
accessed.

> In general, please, no BUG_ON(), at least no more BUG_ON() on our nested MMU...

In this case, the alternative is worse, as writing "head_desc->sptes[j]" with
"j < 0", i.e. a very large value if j is stored as an unsigned, would write
arbitrary kernel memory.

I completely agree that the BUG_ON() is gross, but that's an existing issue that
needs to be remedied.  And to do that, I really want to resurrect and utilize
the KVM_BUG_ON_DATA_CORRUPTION() idea[*].

[*] https://lore.kernel.org/all/Y5dax8XJV0F5adUw@google.com