linux-kernel - Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YJw3MH2kTftwvlGa@t490s>
Date:   Wed, 12 May 2021 16:14:40 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        open list <linux-kernel@...r.kernel.org>,
        Axel Rasmussen <axelrasmussen@...gle.com>
Subject: Re: [PATCH] mm, hugetlb: fix resv_huge_pages underflow on UFFDIO_COPY

Mina,

On Wed, May 12, 2021 at 12:42:32PM -0700, Mina Almasry wrote:
> > >> @@ -4868,30 +4869,39 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
> > >>                             struct page **pagep)
> > >>  {
> > >>         bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE);
> > >> -       struct address_space *mapping;
> > >> -       pgoff_t idx;
> > >> +       struct hstate *h = hstate_vma(dst_vma);
> > >> +       struct address_space *mapping = dst_vma->vm_file->f_mapping;
> > >> +       pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> > >>         unsigned long size;
> > >>         int vm_shared = dst_vma->vm_flags & VM_SHARED;
> > >> -       struct hstate *h = hstate_vma(dst_vma);
> > >>         pte_t _dst_pte;
> > >>         spinlock_t *ptl;
> > >> -       int ret;
> > >> +       int ret = -ENOMEM;
> > >>         struct page *page;
> > >>         int writable;
> > >>
> > >> -       mapping = dst_vma->vm_file->f_mapping;
> > >> -       idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> > >> +       /* Out parameter. */
> > >> +       WARN_ON(*pagep);
> > >
> > > I don't think this warning works, because we do set *pagep, in the
> > > copy_huge_page_from_user failure case. In that case, the following
> > > happens:
> > >
> > > 1. We set *pagep, and return immediately.
> > > 2. Our caller notices this particular error, drops mmap_lock, and then
> > > calls us again with *pagep set.
> > >
> > > In this path, we're supposed to just re-use this existing *pagep
> > > instead of allocating a second new page.
> > >
> > > I think this also means we need to keep the "else" case where *pagep
> > > is set below.
> > >
> >
> > +1 to Peter's comment.
> >
> 
> Gah, sorry about that. I'll fix in v2.

I have a question regarding v1: how do you guarantee huge_add_to_page_cache()
won't fail again even if checked before page alloc?  Say, what if the page
cache got inserted after hugetlbfs_pagecache_present() (which is newly added in
your v1) but before huge_add_to_page_cache()?

I also have a feeling that we've been trying to work around something else, but
I can't tell yet as I'll probably need to read a bit more/better on how hugetlb
does the accounting and also on reservations.

Thanks,

-- 
Peter Xu