linux-kernel - Re: [PATCH Part2 RFC v4 10/40] x86/fault: Add support to handle the RMP fault for user address

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b4accb6-b68e-02d3-6fed-975f90558099@amd.com>
Date:   Mon, 12 Jul 2021 10:43:50 -0500
From:   Brijesh Singh <brijesh.singh@....com>
To:     Dave Hansen <dave.hansen@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org
Cc:     brijesh.singh@....com, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>, tony.luck@...el.com,
        npmccallum@...hat.com, brijesh.ksingh@...il.com
Subject: Re: [PATCH Part2 RFC v4 10/40] x86/fault: Add support to handle the
 RMP fault for user address

Hi Dave,


On 7/8/21 11:16 AM, Dave Hansen wrote:
> 
> "SIGBUG"?

Its typo, it should be SIGBUS

>> +
>> +	if (unlikely(!cpu_feature_enabled(X86_FEATURE_SEV_SNP)))
>> +		return RMP_FAULT_KILL;
> 
> Shouldn't this be a WARN_ON_ONCE()?  How can we get RMP faults without
> SEV-SNP?

Yes, we should *not* get RMP fault if SEV-SNP is not enabled. I can use 
the WARN_ON_ONCE().


> 
>> +	/* Get the native page level */
>> +	pte = lookup_address_in_mm(current->mm, address, &level);
>> +	if (unlikely(!pte))
>> +		return RMP_FAULT_KILL;
> 
> What would this mean?  There was an RMP fault on a non-present page?
> How could that happen?  What if there was a race between an unmapping
> event and the RMP fault delivery?

We should not have RMP fault for non-present pages. But you have a good 
point that there maybe a race between the unmap event and RMP fault. 
Instead of terminating the process we should simply retry.


> 
>> +	pfn = pte_pfn(*pte);
>> +	if (level > PG_LEVEL_4K) {
>> +		mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
>> +		pfn |= (address >> PAGE_SHIFT) & mask;
>> +	}
> 
> This looks inherently racy.  What happens if there are two parallel RMP
> faults on the same 2M page.  One of them splits the page tables, the
> other gets a fault for an already-split page table.
>  > Is that handled here somehow?

Yes, in this particular case we simply retry and hardware should 
re-evaluate the page level and take the corrective action.


> 
>> +	/* Get the page level from the RMP entry. */
>> +	e = snp_lookup_page_in_rmptable(pfn_to_page(pfn), &rmp_level);
>> +	if (!e)
>> +		return RMP_FAULT_KILL;
> 
> The snp_lookup_page_in_rmptable() failure cases looks WARN-worthly.
> Either you're doing a lookup for something not *IN* the RMP table, or
> you don't support SEV-SNP, in which case you shouldn't be in this code
> in the first place.

Noted.

> 
>> +	/*
>> +	 * Check if the RMP violation is due to the guest private page access.
>> +	 * We can not resolve this RMP fault, ask to kill the guest.
>> +	 */
>> +	if (rmpentry_assigned(e))
>> +		return RMP_FAULT_KILL;
> 
> No "We's", please.  Speak in imperative voice.

Noted.

> 
>> +	/*
>> +	 * The backing page level is higher than the RMP page level, request
>> +	 * to split the page.
>> +	 */
>> +	if (level > rmp_level)
>> +		return RMP_FAULT_PAGE_SPLIT;
> 
> This can theoretically trigger on a hugetlbfs page.  Right?
> 

Yes, theoretically.

In the current implementation, the VMM is enlightened to not use the 
hugetlbfs for backing page when creating the SEV-SNP guests.


> I thought I asked about this before... more below...
> 
>> +	return RMP_FAULT_RETRY;
>> +}
>> +
>>   /*
>>    * Handle faults in the user portion of the address space.  Nothing in here
>>    * should check X86_PF_USER without a specific justification: for almost
>> @@ -1298,6 +1350,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>>   	struct task_struct *tsk;
>>   	struct mm_struct *mm;
>>   	vm_fault_t fault;
>> +	int ret;
>>   	unsigned int flags = FAULT_FLAG_DEFAULT;
>>   
>>   	tsk = current;
>> @@ -1378,6 +1431,22 @@ void
> (struct pt_regs *regs,
>>   	if (error_code & X86_PF_INSTR)
>>   		flags |= FAULT_FLAG_INSTRUCTION;
>>   
>> +	/*
>> +	 * If its an RMP violation, try resolving it.
>> +	 */
>> +	if (error_code & X86_PF_RMP) {
>> +		ret = handle_user_rmp_page_fault(error_code, address);
>> +		if (ret == RMP_FAULT_PAGE_SPLIT) {
>> +			flags |= FAULT_FLAG_PAGE_SPLIT;
>> +		} else if (ret == RMP_FAULT_KILL) {
>> +			fault |= VM_FAULT_SIGBUS;
>> +			do_sigbus(regs, error_code, address, fault);
>> +			return;
>> +		} else {
>> +			return;
>> +		}
>> +	}
> 
> Why not just have handle_user_rmp_page_fault() return a VM_FAULT_* code
> directly?
> 

I don't have any strong reason against it. In next rev, I can update to 
use the VM_FAULT_* code and call the do_sigbus() etc.

> I also suspect you can just set VM_FAULT_SIGBUS and let the do_sigbus()
> call later on in the function do its work.
>>   
>> +static int handle_split_page_fault(struct vm_fault *vmf)
>> +{
>> +	if (!IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))
>> +		return VM_FAULT_SIGBUS;
>> +
>> +	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
>> +	return 0;
>> +}
> 
> What will this do when you hand it a hugetlbfs page?
> 

VMM is updated to not use the hugetlbfs when creating SEV-SNP guests. 
So, we should not run into it.

-Brijesh