[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CEDBC792-DE5A-42CB-AA31-40C039470BD0@nvidia.com>
Date: Fri, 15 Feb 2019 14:20:37 -0800
From: Zi Yan <ziy@...dia.com>
To: <lsf-pc@...ts.linux-foundation.org>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Matthew Wilcox <willy@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Hugh Dickins <hughd@...gle.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Anshuman Khandual <anshuman.khandual@....com>,
John Hubbard <jhubbard@...dia.com>,
Mark Hairgrove <mhairgrove@...dia.com>,
Nitin Gupta <nigupta@...dia.com>,
David Nellans <dnellans@...dia.com>
Subject: [LSF/MM TOPIC] Generating physically contiguous memory
The Problem
----
Large pages and physically contiguous memory are important to devices,
such as GPUs, FPGAs, NICs and RDMA controllers, because they can often
reduce address translation overheads and hence achieve better
performance when operating on large pages (2MB and beyond). The same can
be said of CPU performance, of course, but there is an important
difference: GPUs and high-throughput devices often take a more severe
performance hit, in the event of a TLB miss, as compared to a CPU,
because larger volume of in-flight work is stalled due to the TLB miss
and the induced page table walks. The effect is sufficiently large that
such devices *really* want highly reliable ways to allocate large pages
to minimize TLB misses and reduce the duration of page table walks.
Due to the lack of flexibility, Approaches using memory reservation at
boot time (such as hugetlbfs) are a compromise that would be nice to
avoid. THPs, in general, seems to be a proper way to go because it is
transparent to userspace and provides large pages, but it is not perfect
yet. The community is still working on it since 1) THP size is limited
by the page allocation system and 2) THP creation requires a lot of
effort (e.g., memory compaction and page reclamation on the critical
path of page allocations).
Possible solutions
----
1. I recently posted an RFC [1] about actively generating physically
contiguous memory from in-use pages after page allocation. This RFC
moves pages around and make them physically contiguous when possible. It
is different from existing approaches, since it does not rely on page
allocation. On the other hand, this approach is still affected by
non-moveable pages scattered across the memory, which is highly related
but orthogonal and one of whose possible solutions is proposed by Mel
Gorman recently [2].
2. THPs could be a solution as it provide large pages. THP avoids memory
reservation at boot time, but to meet the needs, i.e., a lot of large
pages, of some of these high-throughput accelerators, we need to make it
easier to produce large pages, namely increasing the successful rate of
allocating THPs and decreasing the overheads of allocating them. Mel
Gorman has posted a related patchset [3].
It is also possible to generate THPs in the background, either like what
khugepaged does right now, or periodically perform memory compaction to
lower whole memory fragmentation level, or having certain amount of THP
pools for future use. But these solutions still face the same problem.
3. A more restricted but more reliable way might be using libhugetlbfs.
It reserves memory, which is dedicated to large page allocations and
hence requires less effort to obtain large pages. It also supports page
sizes larger than 2MB, which further reduces address translation
overheads. But AFAIK device drivers are not able to directly grab large
pages from libhugetlbfs, which is something devices want.
4. Recently Matthew Wilcox mentioned his XArray is going to support
arbitrary sized pages [4], which would help maintain physically
contiguous ranges once created (aka my RFC). Once my RFC generates
physically contiguous memory, XArrays would maintain the page size and
prevent reclaim/compaction from breaking them. Getting arbitrary sized
pages can still be beneficial to devices when larger than 2MB pages
becomes very difficult to get.
Feel free to provide your comments.
Thanks.
[1] https://lore.kernel.org/lkml/20190215220856.29749-1-zi.yan@sent.com/
[2]
https://lore.kernel.org/lkml/20181123114528.28802-1-mgorman@techsingularity.net/
[3]
https://lore.kernel.org/lkml/20190118175136.31341-1-mgorman@techsingularity.net/
[4]
https://lore.kernel.org/lkml/20190208042448.GB21860@bombadil.infradead.org/
--
Best Regards,
Yan Zi
Powered by blists - more mailing lists