linux-kernel - Re: [PATCH v2 2/2] mm/gup/writeback: add callbacks for inaccessible pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 2 Mar 2020 23:59:32 -0800
From:   John Hubbard <jhubbard@...dia.com>
To:     Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        <linux-next@...r.kernel.org>, <akpm@...ux-foundation.org>,
        <jack@...e.cz>, <kirill@...temov.name>
CC:     <borntraeger@...ibm.com>, <david@...hat.com>,
        <aarcange@...hat.com>, <linux-mm@...ck.org>,
        <frankja@...ux.ibm.com>, <sfr@...b.auug.org.au>,
        <linux-kernel@...r.kernel.org>, <linux-s390@...r.kernel.org>,
        Will Deacon <will@...nel.org>
Subject: Re: [PATCH v2 2/2] mm/gup/writeback: add callbacks for inaccessible
 pages

On 3/2/20 4:25 PM, Claudio Imbrenda wrote:
> With the introduction of protected KVM guests on s390 there is now a
> concept of inaccessible pages. These pages need to be made accessible
> before the host can access them.
> 
> While cpu accesses will trigger a fault that can be resolved, I/O
> accesses will just fail.  We need to add a callback into architecture
> code for places that will do I/O, namely when writeback is started or
> when a page reference is taken.
> 
> This is not only to enable paging, file backing etc, it is also
> necessary to protect the host against a malicious user space.  For
> example a bad QEMU could simply start direct I/O on such protected
> memory.  We do not want userspace to be able to trigger I/O errors and
> thus the logic is "whenever somebody accesses that page (gup) or does
> I/O, make sure that this page can be accessed".  When the guest tries
> to access that page we will wait in the page fault handler for
> writeback to have finished and for the page_ref to be the expected
> value.
> 
> On s390x the function is not supposed to fail, so it is ok to use a
> WARN_ON on failure. If we ever need some more finegrained handling
> we can tackle this when we know the details.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@...ux.ibm.com>
> Acked-by: Will Deacon <will@...nel.org>
> Reviewed-by: David Hildenbrand <david@...hat.com>
> Reviewed-by: Christian Borntraeger <borntraeger@...ibm.com>
> ---
>   include/linux/gfp.h |  6 ++++++
>   mm/gup.c            | 27 ++++++++++++++++++++++++---
>   mm/page-writeback.c |  5 +++++
>   3 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index e5b817cb86e7..be2754841369 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -485,6 +485,12 @@ static inline void arch_free_page(struct page *page, int order) { }
>   #ifndef HAVE_ARCH_ALLOC_PAGE
>   static inline void arch_alloc_page(struct page *page, int order) { }
>   #endif
> +#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
> +static inline int arch_make_page_accessible(struct page *page)
> +{
> +	return 0;
> +}
> +#endif
>   
>   struct page *
>   __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
> diff --git a/mm/gup.c b/mm/gup.c
> index 81a95fbe9901..15c47e0e86f8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -413,6 +413,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>   	struct page *page;
>   	spinlock_t *ptl;
>   	pte_t *ptep, pte;
> +	int ret;
>   
>   	/* FOLL_GET and FOLL_PIN are mutually exclusive. */
>   	if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
> @@ -471,8 +472,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>   		if (is_zero_pfn(pte_pfn(pte))) {
>   			page = pte_page(pte);
>   		} else {
> -			int ret;
> -
>   			ret = follow_pfn_pte(vma, address, ptep, flags);
>   			page = ERR_PTR(ret);
>   			goto out;
> @@ -480,7 +479,6 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>   	}
>   
>   	if (flags & FOLL_SPLIT && PageTransCompound(page)) {
> -		int ret;
>   		get_page(page);
>   		pte_unmap_unlock(ptep, ptl);
>   		lock_page(page);
> @@ -497,6 +495,19 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>   		page = ERR_PTR(-ENOMEM);
>   		goto out;
>   	}
> +	/*
> +	 * We need to make the page accessible if we are actually going to
> +	 * poke at its content (pin), otherwise we can leave it inaccessible.
> +	 * If we cannot make the page accessible, fail.
> +	 */
> +	if (flags & FOLL_PIN) {
> +		ret = arch_make_page_accessible(page);
> +		if (ret) {
> +			unpin_user_page(page);
> +			page = ERR_PTR(ret);
> +			goto out;
> +		}
> +	}


That looks good.


>   	if (flags & FOLL_TOUCH) {
>   		if ((flags & FOLL_WRITE) &&
>   		    !pte_dirty(pte) && !PageDirty(page))
> @@ -2162,6 +2173,16 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   
>   		VM_BUG_ON_PAGE(compound_head(page) != head, page);
>   
> +		/*
> +		 * We need to make the page accessible if we are actually
> +		 * going to poke at its content (pin), otherwise we can
> +		 * leave it inaccessible. If the page cannot be made
> +		 * accessible, fail.
> +		 */


This part looks good, so these two points are just nits:

That's a little bit of repeating what the code does, in the comments. How about:

		/*
		 * We need to make the page accessible if and only if we are
		 * going to access its content (the FOLL_PIN case). Please see
		 * Documentation/core-api/pin_user_pages.rst for details.
		 */


> +		if ((flags & FOLL_PIN) && arch_make_page_accessible(page)) {
> +			unpin_user_page(page);
> +			goto pte_unmap;
> +		}


Your style earlier in the patch was easier on the reader, why not stay consistent
with that (and with this file, which tends also to do this), so:

		if (flags & FOLL_PIN) {
			ret = arch_make_page_accessible(page);
			if (ret) {
				unpin_user_page(page);
				goto pte_unmap;
			}
		}




>   		SetPageReferenced(page);
>   		pages[*nr] = page;
>   		(*nr)++;
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ab5a3cee8ad3..8384be5a2758 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2807,6 +2807,11 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
>   		inc_zone_page_state(page, NR_ZONE_WRITE_PENDING);
>   	}
>   	unlock_page_memcg(page);
> +	/*
> +	 * If writeback has been triggered on a page that cannot be made
> +	 * accessible, it is too late.
> +	 */
> +	WARN_ON(arch_make_page_accessible(page));


I'm not deep enough into this area to know if a) this is correct, and b) if there are any
other places that need arch_make_page_accessible() calls. So I'll rely on other
reviewers to help check on that.


>   	return ret;
>   
>   }
> 

Anyway, I don't see any problems, and as I said, those documentation and style points are
just nitpicks, not bugs.


thanks,
-- 
John Hubbard
NVIDIA