lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 19 Jul 2021 09:19:33 -0500
From:   Brijesh Singh <brijesh.singh@....com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>, tony.luck@...el.com,
        npmccallum@...hat.com, brijesh.ksingh@...il.com
Subject: Re: [PATCH Part2 RFC v4 33/40] KVM: SVM: Add support to handle MSR
 based Page State Change VMGEXIT



On 7/16/21 4:00 PM, Sean Christopherson wrote:
> On Wed, Jul 07, 2021, Brijesh Singh wrote:
>> +static int __snp_handle_psc(struct kvm_vcpu *vcpu, int op, gpa_t gpa, int level)
> 
> I can live with e.g. GHCB_MSR_PSC_REQ, but I'd strongly prefer to spell this out,
> e.g. __snp_handle_page_state_change() or whatever.  I had a hell of a time figuring
> out what PSC was the first time I saw it in some random context.

Based on the previous review feedback I renamed from 
__snp_handle_page_state_change to __snp_handle_psc(). I will see what 
others say and based on that will rename accordingly.

> 
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	int rc, tdp_level;
>> +	kvm_pfn_t pfn;
>> +	gpa_t gpa_end;
>> +
>> +	gpa_end = gpa + page_level_size(level);
>> +
>> +	while (gpa < gpa_end) {
>> +		/*
>> +		 * Get the pfn and level for the gpa from the nested page table.
>> +		 *
>> +		 * If the TDP walk failed, then its safe to say that we don't have a valid
>> +		 * mapping for the gpa in the nested page table. Create a fault to map the
>> +		 * page is nested page table.
>> +		 */
>> +		if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level)) {
>> +			pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, level);
>> +			if (is_error_noslot_pfn(pfn))
>> +				goto out;
>> +
>> +			if (!kvm_mmu_get_tdp_walk(vcpu, gpa, &pfn, &tdp_level))
>> +				goto out;
>> +		}
>> +
>> +		/* Adjust the level so that we don't go higher than the backing page level */
>> +		level = min_t(size_t, level, tdp_level);
>> +
>> +		write_lock(&kvm->mmu_lock);
> 
> Retrieving the PFN and level outside of mmu_lock is not correct.  Because the
> pages are pinned and the VMM is not malicious, it will function as intended, but
> it is far from correct.
> 

Good point, I should have retrieved the pfn and level inside the lock.

> The overall approach also feels wrong, e.g. a guest won't be able to convert a
> 2mb chunk back to a 2mb large page if KVM mapped the GPA as a 4kb page in the
> past (from a different conversion).
> 

Maybe I am missing something, I am not able to follow 'guest won't be 
able to convert a 2mb chunk back to a 2mb large page'. The page-size 
used inside the guest have to relationship with the RMP/NPT page-size. 
e.g, a guest can validate the page range as a 4k and still map the page 
range as a 2mb or 1gb in its pagetable.


> I'd also strongly prefer to have a common flow between SNP and TDX for converting
> between shared/prviate.
> 
> I'll circle back to this next week, it'll probably take a few hours of staring
> to figure out a solution, if a common one for SNP+TDX is even possible.
> 

Sounds good.

>> +
>> +		switch (op) {
>> +		case SNP_PAGE_STATE_SHARED:
>> +			rc = snp_make_page_shared(vcpu, gpa, pfn, level);
>> +			break;
>> +		case SNP_PAGE_STATE_PRIVATE:
>> +			rc = snp_make_page_private(vcpu, gpa, pfn, level);
>> +			break;
>> +		default:
>> +			rc = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		write_unlock(&kvm->mmu_lock);
>> +
>> +		if (rc) {
>> +			pr_err_ratelimited("Error op %d gpa %llx pfn %llx level %d rc %d\n",
>> +					   op, gpa, pfn, level, rc);
>> +			goto out;
>> +		}
>> +
>> +		gpa = gpa + page_level_size(level);
>> +	}
>> +
>> +out:
>> +	return rc;
>> +}
>> +
>>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>   {
>>   	struct vmcb_control_area *control = &svm->vmcb->control;
>> @@ -2941,6 +3063,25 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>>   				  GHCB_MSR_INFO_POS);
>>   		break;
>>   	}
>> +	case GHCB_MSR_PSC_REQ: {
>> +		gfn_t gfn;
>> +		int ret;
>> +		u8 op;
>> +
>> +		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_GFN_MASK, GHCB_MSR_PSC_GFN_POS);
>> +		op = get_ghcb_msr_bits(svm, GHCB_MSR_PSC_OP_MASK, GHCB_MSR_PSC_OP_POS);
>> +
>> +		ret = __snp_handle_psc(vcpu, op, gfn_to_gpa(gfn), PG_LEVEL_4K);
>> +
>> +		/* If failed to change the state then spec requires to return all F's */
> 
> That doesn't mesh with what I could find:
> 
>    o 0x015 – SNP Page State Change Response
>      ▪ GHCBData[63:32] – Error code
>      ▪ GHCBData[31:12] – Reserved, must be zero
>    Written by the hypervisor in response to a Page State Change request. Any non-
>    zero value for the error code indicates that the page state change was not
>    successful.
> 
> And if "all Fs" is indeed the error code, 'int ret' probably only works by luck
> since the return value is a 64-bit value, where as ret is a 32-bit signed int.
> 
>> +		if (ret)
>> +			ret = -1;
> 
> Uh, this is fubar.   You've created a shadow of 'ret', i.e. the outer ret is likely
> uninitialized.
> 

Ah, let me fix it in next rev.

thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ