lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210201232155.GL260413@xz-x1>
Date:   Mon, 1 Feb 2021 18:21:55 -0500
From:   Peter Xu <peterx@...hat.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Axel Rasmussen <axelrasmussen@...gle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Chinwen Chang <chinwen.chang@...iatek.com>,
        Huang Ying <ying.huang@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jerome Glisse <jglisse@...hat.com>,
        Lokesh Gidra <lokeshgidra@...gle.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Koutný <mkoutny@...e.com>,
        Michel Lespinasse <walken@...gle.com>,
        Mike Rapoport <rppt@...ux.vnet.ibm.com>,
        Nicholas Piggin <npiggin@...il.com>, Shaohua Li <shli@...com>,
        Shawn Anastasio <shawn@...stas.io>,
        Steven Rostedt <rostedt@...dmis.org>,
        Steven Price <steven.price@....com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        Adam Ruprecht <ruprecht@...gle.com>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Oliver Upton <oupton@...gle.com>
Subject: Re: [PATCH v3 4/9] hugetlb/userfaultfd: Unshare all pmds for
 hugetlbfs when register wp

On Mon, Feb 01, 2021 at 02:33:20PM -0800, Mike Kravetz wrote:
> On 1/28/21 2:48 PM, Axel Rasmussen wrote:
> > From: Peter Xu <peterx@...hat.com>
> > 
> > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
> > userfaultfd-wp is always based on pgtable entries, so they cannot be shared.
> > 
> > Walk the hugetlb range and unshare all such mappings if there is, right before
> > UFFDIO_REGISTER will succeed and return to userspace.
> > 
> > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing
> > is completely disabled for userfaultfd-wp registered range.
> > 
> > Signed-off-by: Peter Xu <peterx@...hat.com>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@...gle.com>
> > ---
> >  fs/userfaultfd.c             | 45 ++++++++++++++++++++++++++++++++++++
> >  include/linux/mmu_notifier.h |  1 +
> >  2 files changed, 46 insertions(+)
> > 
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index 894cc28142e7..2c6706ac2504 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/sched/signal.h>
> >  #include <linux/sched/mm.h>
> >  #include <linux/mm.h>
> > +#include <linux/mmu_notifier.h>
> >  #include <linux/poll.h>
> >  #include <linux/slab.h>
> >  #include <linux/seq_file.h>
> > @@ -1190,6 +1191,47 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
> >  	}
> >  }
> >  
> > +/*
> > + * This function will unconditionally remove all the shared pmd pgtable entries
> > + * within the specific vma for a hugetlbfs memory range.
> > + */
> > +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	struct hstate *h = hstate_vma(vma);
> > +	unsigned long sz = huge_page_size(h);
> > +	struct mm_struct *mm = vma->vm_mm;
> > +	struct mmu_notifier_range range;
> > +	unsigned long address;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +
> 
> Perhaps we should add a quick to see if vma is sharable.  Might be as
> simple as !(vma->vm_flags & VM_MAYSHARE).  I see a comment/question in
> a later patch about only doing minor fault processing on shared mappings.

Yes, that comment was majorly about shmem though - I believe shared case should
still be the major one, especially for hugetlbfs.

So what I was thinking is something like: one non-uffd process use shared
mapping of the file, meanwhile the other uffd process used private mapping on
the same file.  When the uffd process access page it could fault in the page
cache and continued by UFFDIO_CONTINUE, however when it writes it'll COW into
private pages.  Something like that.  Not sure whether it's useful, but I just
don't see why we should block that case.

> 
> Code below looks fine, but it would be a wast to do all that for a vma
> that could not be shared.

Right, still better to check it.

Mike, I agree with all your comments on the initial 4 patches, thanks for the
input!  To make Axel's life easier, I've modified them locally and pushed since
after all I'll do it in my series too (I also picked Mike's r-b on patch 3):

https://github.com/xzpeter/linux/commits/uffd-wp-shmem-hugetlbfs

Axel, feel free to fetch from it directly.

Thanks,

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ