linux-kernel - Re: [PATCH v3 2/2] s390/mm: re-enable the shared zeropage for !PV and !skeys KVM guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8533cb18-42ff-42bc-b9e5-b0537aa51b21@redhat.com>
Date: Mon, 15 Apr 2024 21:14:03 +0200
From: David Hildenbrand <david@...hat.com>
To: Alexander Gordeev <agordeev@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Christian Borntraeger <borntraeger@...ux.ibm.com>,
 Janosch Frank <frankja@...ux.ibm.com>,
 Claudio Imbrenda <imbrenda@...ux.ibm.com>, Heiko Carstens
 <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
 Andrew Morton <akpm@...ux-foundation.org>, Peter Xu <peterx@...hat.com>,
 Sven Schnelle <svens@...ux.ibm.com>,
 Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
 Andrea Arcangeli <aarcange@...hat.com>, kvm@...r.kernel.org,
 linux-s390@...r.kernel.org
Subject: Re: [PATCH v3 2/2] s390/mm: re-enable the shared zeropage for !PV and
 !skeys KVM guests

On 15.04.24 20:24, Alexander Gordeev wrote:
> On Thu, Apr 11, 2024 at 06:14:41PM +0200, David Hildenbrand wrote:
> 
> David, could you please clarify the below questions?

Sure, let me take a look if we're still missing to handle some corner cases correctly.

> 
>> +static int __s390_unshare_zeropages(struct mm_struct *mm)
>> +{
>> +	struct vm_area_struct *vma;
>> +	VMA_ITERATOR(vmi, mm, 0);
>> +	unsigned long addr;
>> +	int rc;
>> +
>> +	for_each_vma(vmi, vma) {
>> +		/*
>> +		 * We could only look at COW mappings, but it's more future
>> +		 * proof to catch unexpected zeropages in other mappings and
>> +		 * fail.
>> +		 */
>> +		if ((vma->vm_flags & VM_PFNMAP) || is_vm_hugetlb_page(vma))
>> +			continue;
>> +		addr = vma->vm_start;
>> +
>> +retry:
>> +		rc = walk_page_range_vma(vma, addr, vma->vm_end,
>> +					 &find_zeropage_ops, &addr);
>> +		if (rc <= 0)
>> +			continue;
> 
> So in case an error is returned for the last vma, __s390_unshare_zeropage()
> finishes with that error. By contrast, the error for a non-last vma would
> be ignored?

Right, it looks a bit off. walk_page_range_vma() shouldn't fail
unless find_zeropage_pte_entry() would fail -- which would also be
very unexpected.

To handle it cleanly in case we would ever get a weird zeropage where we
don't expect it, we should probably just exit early.

Something like the following (not compiled, addressing the comment below):


 From b97cd17a3697ac402b07fe8d0033f3c10fbd6829 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@...hat.com>
Date: Mon, 15 Apr 2024 20:56:20 +0200
Subject: [PATCH] fixup

Signed-off-by: David Hildenbrand <david@...hat.com>
---
  arch/s390/mm/gmap.c | 13 ++++++++-----
  1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 9233b0acac89..3e3322a9cc32 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2618,7 +2618,8 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
  	struct vm_area_struct *vma;
  	VMA_ITERATOR(vmi, mm, 0);
  	unsigned long addr;
-	int rc;
+	vm_fault_t rc;
+	int zero_page;
  
  	for_each_vma(vmi, vma) {
  		/*
@@ -2631,9 +2632,11 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
  		addr = vma->vm_start;
  
  retry:
-		rc = walk_page_range_vma(vma, addr, vma->vm_end,
-					 &find_zeropage_ops, &addr);
-		if (rc <= 0)
+		zero_page = walk_page_range_vma(vma, addr, vma->vm_end,
+						&find_zeropage_ops, &addr);
+		if (zero_page < 0)
+			return zero_page;
+		else if (!zero_page)
  			continue;
  
  		/* addr was updated by find_zeropage_pte_entry() */
@@ -2656,7 +2659,7 @@ static int __s390_unshare_zeropages(struct mm_struct *mm)
  		goto retry;
  	}
  
-	return rc;
+	return 0;
  }
  
  static int __s390_disable_cow_sharing(struct mm_struct *mm)
-- 
2.44.0



> 
>> +
>> +		/* addr was updated by find_zeropage_pte_entry() */
>> +		rc = handle_mm_fault(vma, addr,
>> +				     FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE,
>> +				     NULL);
>> +		if (rc & VM_FAULT_OOM)
>> +			return -ENOMEM;
> 
> Heiko pointed out that rc type is inconsistent vs vm_fault_t returned by

Right, let's use another variable for that.

> handle_mm_fault(). While fixing it up, I've got concerned whether is it
> fine to continue in case any other error is met (including possible future
> VM_FAULT_xxxx)?

Such future changes would similarly break break_ksm(). Staring at it, I do wonder
if break_ksm() should be handling VM_FAULT_HWPOISON ... very likely we should
handle it and fail -- we might get an MC while copying from the source page.

VM_FAULT_HWPOISON on the shared zeropage would imply a lot of trouble, so
I'm not concerned about that for the case here, but handling it in the future
would be cleaner.

Note that we always retry the lookup, so we won't just skip a zeropage on unexpected
errors.

We could piggy-back on vm_fault_to_errno(). We could use
vm_fault_to_errno(rc, FOLL_HWPOISON), and only continue (retry) if the rc is 0 or
-EFAULT, otherwise fail with the returned error.

But I'd do that as a follow up, and also use it in break_ksm() in the same fashion.

-- 
Cheers,

David / dhildenb