linux-kernel - Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210113195528.GD4605@ziepe.ca>
Date:   Wed, 13 Jan 2021 15:55:28 -0400
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Pavel Tatashin <pasha.tatashin@...een.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>,
        David Hildenbrand <david@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Dan Williams <dan.j.williams@...el.com>,
        Sasha Levin <sashal@...nel.org>,
        Tyler Hicks <tyhicks@...ux.microsoft.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>, mike.kravetz@...cle.com,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Matthew Wilcox <willy@...radead.org>,
        David Rientjes <rientjes@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Linux Doc Mailing List <linux-doc@...r.kernel.org>,
        Ira Weiny <ira.weiny@...el.com>,
        linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures,
 honor failures

On Wed, Jan 13, 2021 at 02:43:50PM -0500, Pavel Tatashin wrote:
> On Fri, Dec 18, 2020 at 9:19 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> >
> > On Thu, Dec 17, 2020 at 05:02:03PM -0500, Pavel Tatashin wrote:
> > > Hi Jason,
> > >
> > > Thank you for your comments. My replies below.
> > >
> > > On Thu, Dec 17, 2020 at 3:50 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > > >
> > > > On Thu, Dec 17, 2020 at 01:52:41PM -0500, Pavel Tatashin wrote:
> > > > > +/*
> > > > > + * Verify that there are no unpinnable (movable) pages, if so return true.
> > > > > + * Otherwise an unpinnable pages is found return false, and unpin all pages.
> > > > > + */
> > > > > +static bool check_and_unpin_pages(unsigned long nr_pages, struct page **pages,
> > > > > +                               unsigned int gup_flags)
> > > > > +{
> > > > > +     unsigned long i, step;
> > > > > +
> > > > > +     for (i = 0; i < nr_pages; i += step) {
> > > > > +             struct page *head = compound_head(pages[i]);
> > > > > +
> > > > > +             step = compound_nr(head) - (pages[i] - head);
> > > >
> > > > You can't assume that all of a compound head is in the pages array,
> > > > this assumption would only work inside the page walkers if the page
> > > > was found in a PMD or something.
> > >
> > > I am not sure I understand your comment. The compound head is not
> > > taken from the pages array, and not assumed to be in it. It is exactly
> > > the same logic as that we currently have:
> > > https://soleen.com/source/xref/linux/mm/gup.c?r=a00cda3f#1565
> >
> > Oh, that existing logic is wrong too :( Another bug.
> 
> I do not think there is a bug.
> 
> > You can't skip pages in the pages[] array under the assumption they
> > are contiguous. ie the i+=step is wrong.
> 
> If pages[i] is part of a compound page, the other parts of this page
> must be sequential in this array for this compound page

That is true only if the PMD points to the page. If the PTE points to
a tail page then there is no requirement that other PTEs are
contiguous with the compount page.

At this point we have no idea if the GUP logic got this compound page
as a head page in a PMD or as a tail page from a PTE, so we can't
assume a contiguous run of addresses.

Look at split_huge_pmd() - it doesn't break up the compound page it
just converts the PMD to a PTE array and scatters the tail pages to
the PTE.

I understand Matt is pushing on this idea more by having compound
pages in the page cache, but still mapping tail pages when required.

> This is actually standard migration procedure, elsewhere in the kernel
> we migrate pages in exactly the same fashion: isolate and later
> migrate. The isolation works for LRU only pages.

But do other places cause a userspace visible random failure when LRU
isolation fails?

I don't like it at all, what is the user supposed to do?

Jason