linux-kernel - Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d2313c1d-1e50-49b7-bed7-840431af799a@arm.com>
Date:   Thu, 23 Nov 2023 19:11:19 +0000
From:   Ryan Roberts <ryan.roberts@....com>
To:     Peter Xu <peterx@...hat.com>, Matthew Wilcox <willy@...radead.org>
Cc:     Christoph Hellwig <hch@...radead.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        James Houghton <jthoughton@...gle.com>,
        Lorenzo Stoakes <lstoakes@...il.com>,
        David Hildenbrand <david@...hat.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        John Hubbard <jhubbard@...dia.com>,
        Yang Shi <shy828301@...il.com>,
        Rik van Riel <riel@...riel.com>,
        Hugh Dickins <hughd@...gle.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linuxppc-dev@...ts.ozlabs.org, Mike Rapoport <rppt@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>
Subject: Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd
 processing

On 23/11/2023 17:22, Peter Xu wrote:
> On Thu, Nov 23, 2023 at 03:47:49PM +0000, Matthew Wilcox wrote:
>> It looks like ARM (in the person of Ryan) are going to add support for
>> something equivalent to hugepd.
> 
> If it's about arm's cont_pte, then it looks ideal because this series
> didn't yet touch cont_pte, assuming it'll just work.  From that aspect, his
> work may help mine, and no immediately collapsing either.

Hi,

I'm not sure I've 100% understood the crossover between this series and my work
to support arm64's contpte mappings generally for anonymous and file-backed memory.

My approach is to transparently use contpte mappings when core-mm request pte
mappings that meet the requirements; and its all based around intercepting the
normal (non-hugetlb) helpers (e.g. set_ptes(), ptep_get() and friends). There is
no semantic change to the core-mm. See [1]. It relies on 1) the page cache using
large folios and 2) my "small-sized THP" series which starts using arbitrary
sized large folios for anonymous memory [2].

If I've understood this conversation correctly there is an object called hugepd,
which today is only supported by powerpc, but which could allow the core-mm to
control the mapping granularity? I can see some value in exposing that control
to core-mm in the (very) long term.

[1] https://lore.kernel.org/all/20231115163018.1303287-1-ryan.roberts@arm.com/
[2] https://lore.kernel.org/linux-mm/20231115132734.931023-1-ryan.roberts@arm.com/

Thanks,
Ryan


> 
> There can be a slight performance difference which I need to measure for
> arm's cont_pte already for hugetlb, but I didn't worry much on that;
> quotting my commit message in the last patch:
> 
>     There may be a slight difference of how the loops run when processing
>     GUP over a large hugetlb range on either ARM64 (e.g. CONT_PMD) or RISCV
>     (mostly its Svnapot extension on 64K huge pages): each loop of
>     __get_user_pages() will resolve one pgtable entry with the patch
>     applied, rather than relying on the size of hugetlb hstate, the latter
>     may cover multiple entries in one loop.
>     
>     However, the performance difference should hopefully not be a major
>     concern, considering that GUP just yet got 57edfcfd3419 ("mm/gup:
>     accelerate thp gup even for "pages != NULL""), and that's not part of a
>     performance analysis but a side dish.  If the performance will be a
>     concern, we can consider handle CONT_PTE in follow_page(), for example.
> 
> So IMHO it can be slightly different comparing to e.g. page fault, because
> each fault is still pretty slow as a whole if one fault for each small pte
> (of a large folio / cont_pte), while the loop in GUP is still relatively
> tight and short, comparing to a fault.  I'd boldly guess more low hanging
> fruits out there for large folio outside GUP areas.
> 
> In all cases, it'll be interesting to know if Ryan has worked on cont_pte
> support for gup on large folios, and whether there's any performance number
> to share.  It's definitely good news to me because it means Ryan's work can
> also then benefit hugetlb if this series will be merged, I just don't know
> how much difference there will be.
> 
> Thanks,
>