linux-kernel - Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b4067e4e-6901-265e-49f8-0142b52f5819@amd.com>
Date:   Wed, 13 Oct 2021 12:49:03 -0500
From:   Brijesh Singh <brijesh.singh@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Andi Kleen <ak@...ux.intel.com>, tony.luck@...el.com,
        marcorr@...gle.com, sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH Part2 v5 37/45] KVM: SVM: Add support to handle MSR based
 Page State Change VMGEXIT


On 10/13/21 10:24 AM, Sean Christopherson wrote:
> On Wed, Oct 13, 2021, Brijesh Singh wrote:
>>> The more I look at this, the more strongly I feel that private <=> shared conversions
>>> belong in the MMU, and that KVM's SPTEs should be the single source of truth for
>>> shared vs. private.  E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits.
>>> I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot
>>> deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated
>>> memslot deletion.
>>>
>>> But that is a solvable problem.  Ideally the bug, wherever it is, would be root
>>> caused and fixed.  I believe Peter (and Marc?) is going to work on reproducing
>>> the bug.
>> We have been also setting up VM with Qemu + VFIO + GPU usecase to repro
>> the bug on AMD HW and so far we no luck in reproducing it. Will continue
>> stressing the system to recreate it. Lets hope that Peter (and Marc) can
>> easily recreate on Intel HW so that we can work towards fixing it.
> Are you trying on a modern kernel?  If so, double check that nx_huge_pages is off,
> turning that on caused the bug to disappear.  It should be off for AMD systems,
> but it's worth checking.

Yes, this is a recent kernel. I will double check the nx_huge_pages is off.


>>>> +		if (!rc) {
>>>> +			/*
>>>> +			 * This may happen if another vCPU unmapped the page
>>>> +			 * before we acquire the lock. Retry the PSC.
>>>> +			 */
>>>> +			write_unlock(&kvm->mmu_lock);
>>>> +			return 0;
>>> How will the caller (guest?) know to retry the PSC if KVM returns "success"?
>> If a guest is adhering to the GHCB spec then it will see that hypervisor
>> has not processed all the entry and it should retry the PSC.
> But AFAICT that information isn't passed to the guest.  Even in this single-page
> MSR-based case, the caller will say "all good" on a return of 0.
>
> The "full" path is more obvious, as the caller clearly continues to process
> entries unless there's an actual failure.
>
> +       for (; cur <= end; cur++) {
> +               entry = &info->entries[cur];
> +               gpa = gfn_to_gpa(entry->gfn);
> +               level = RMP_TO_X86_PG_LEVEL(entry->pagesize);
> +               op = entry->operation;
> +
> +               if (!IS_ALIGNED(gpa, page_level_size(level))) {
> +                       rc = PSC_INVALID_ENTRY;
> +                       goto out;
> +               }
> +
> +               rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
> +               if (rc)
> +                       goto out;
> +       }
>
Please see the guest kernel patch #19 [1]. In spec there is no special
code for the retry. The guest will look at the PSC hdr to determine how
many entries were processed by the hypervisor (in this particular case a
0). And at time the guest can do whatever it wants. In the case of Linux
guest, we retry the PSC.

[1]
https://lore.kernel.org/linux-mm/20211008180453.462291-20-brijesh.singh@amd.com/

thanks