lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpGqt1V5puRMhLkjG6F2T4xtsDY8qy--ZfBPNL9kxPyWtg@mail.gmail.com>
Date:   Thu, 21 Sep 2023 18:04:30 +0000
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Matthew Wilcox <willy@...radead.org>, akpm@...ux-foundation.org,
        viro@...iv.linux.org.uk, brauner@...nel.org, shuah@...nel.org,
        aarcange@...hat.com, lokeshgidra@...gle.com, peterx@...hat.com,
        hughd@...gle.com, mhocko@...e.com, axelrasmussen@...gle.com,
        rppt@...nel.org, Liam.Howlett@...cle.com, jannh@...gle.com,
        zhangpeng362@...wei.com, bgeffon@...gle.com,
        kaleshsingh@...gle.com, ngeoffray@...gle.com, jdduke@...gle.com,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        kernel-team@...roid.com
Subject: Re: [PATCH 2/3] userfaultfd: UFFDIO_REMAP uABI

On Thu, Sep 14, 2023 at 6:45 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 14.09.23 20:43, David Hildenbrand wrote:
> > On 14.09.23 20:11, Matthew Wilcox wrote:
> >> On Thu, Sep 14, 2023 at 08:26:12AM -0700, Suren Baghdasaryan wrote:
> >>> +++ b/include/linux/userfaultfd_k.h
> >>> @@ -93,6 +93,23 @@ extern int mwriteprotect_range(struct mm_struct *dst_mm,
> >>>    extern long uffd_wp_range(struct vm_area_struct *vma,
> >>>                       unsigned long start, unsigned long len, bool enable_wp);
> >>>
> >>> +/* remap_pages */
> >>> +extern void double_pt_lock(spinlock_t *ptl1, spinlock_t *ptl2);
> >>> +extern void double_pt_unlock(spinlock_t *ptl1, spinlock_t *ptl2);
> >>> +extern ssize_t remap_pages(struct mm_struct *dst_mm,
> >>> +                      struct mm_struct *src_mm,
> >>> +                      unsigned long dst_start,
> >>> +                      unsigned long src_start,
> >>> +                      unsigned long len, __u64 flags);
> >>> +extern int remap_pages_huge_pmd(struct mm_struct *dst_mm,
> >>> +                           struct mm_struct *src_mm,
> >>> +                           pmd_t *dst_pmd, pmd_t *src_pmd,
> >>> +                           pmd_t dst_pmdval,
> >>> +                           struct vm_area_struct *dst_vma,
> >>> +                           struct vm_area_struct *src_vma,
> >>> +                           unsigned long dst_addr,
> >>> +                           unsigned long src_addr);
> >>
> >> Drop the 'extern' markers from function declarations.
> >>
> >>> +int remap_pages_huge_pmd(struct mm_struct *dst_mm,
> >>> +                    struct mm_struct *src_mm,
> >>> +                    pmd_t *dst_pmd, pmd_t *src_pmd,
> >>> +                    pmd_t dst_pmdval,
> >>> +                    struct vm_area_struct *dst_vma,
> >>> +                    struct vm_area_struct *src_vma,
> >>> +                    unsigned long dst_addr,
> >>> +                    unsigned long src_addr)
> >>> +{
> >>> +   pmd_t _dst_pmd, src_pmdval;
> >>> +   struct page *src_page;
> >>> +   struct anon_vma *src_anon_vma, *dst_anon_vma;
> >>> +   spinlock_t *src_ptl, *dst_ptl;
> >>> +   pgtable_t pgtable;
> >>> +   struct mmu_notifier_range range;
> >>> +
> >>> +   src_pmdval = *src_pmd;
> >>> +   src_ptl = pmd_lockptr(src_mm, src_pmd);
> >>> +
> >>> +   BUG_ON(!pmd_trans_huge(src_pmdval));
> >>> +   BUG_ON(!pmd_none(dst_pmdval));
> >>> +   BUG_ON(!spin_is_locked(src_ptl));
> >>> +   mmap_assert_locked(src_mm);
> >>> +   mmap_assert_locked(dst_mm);
> >>> +   BUG_ON(src_addr & ~HPAGE_PMD_MASK);
> >>> +   BUG_ON(dst_addr & ~HPAGE_PMD_MASK);
> >>> +
> >>> +   src_page = pmd_page(src_pmdval);
> >>> +   BUG_ON(!PageHead(src_page));
> >>> +   BUG_ON(!PageAnon(src_page));
> >>
> >> Better to add a src_folio = page_folio(src_page);
> >> and then folio_test_anon() here.
> >>
> >>> +   if (unlikely(page_mapcount(src_page) != 1)) {
> >>
> >> Brr, this is going to miss PTE mappings of this folio.  I think you
> >> actually want folio_mapcount() instead, although it'd be more efficient
> >> to look at folio->_entire_mapcount == 1 and _nr_pages_mapped == 0.
> >> Not wure what a good name for that predicate would be.
> >
> > We have
> >
> >    * It only works on non shared anonymous pages because those can
> >    * be relocated without generating non linear anon_vmas in the rmap
> >    * code.
> >    *
> >    * It provides a zero copy mechanism to handle userspace page faults.
> >    * The source vma pages should have mapcount == 1, which can be
> >    * enforced by using madvise(MADV_DONTFORK) on src vma.
> >
> > Use PageAnonExclusive(). As long as KSM is not involved and you don't
> > use fork(), that flag should be good enough for that use case here.
> >
> ... and similarly don't do any of that swapcount stuff and only check if
> the swap pte is anon exclusive.

I'm preparing v2 and this is the only part left for me to address but
I'm not clear how. David, could you please clarify how I should be
checking swap pte to be exclusive without swapcount?

>
> --
> Cheers,
>
> David / dhildenb
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ