[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <476868f6-8d32-b8d2-855e-4b19e8a54cc2@redhat.com>
Date: Tue, 23 Nov 2021 13:05:01 +0100
From: David Hildenbrand <david@...hat.com>
To: Mina Almasry <almasrymina@...gle.com>,
Jonathan Corbet <corbet@....net>
Cc: Matthew Wilcox <willy@...radead.org>,
"Paul E . McKenney" <paulmckrcu@...com>,
Yu Zhao <yuzhao@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Xu <peterx@...hat.com>,
Ivan Teterevkov <ivan.teterevkov@...anix.com>,
Florian Schmidt <florian.schmidt@...anix.com>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH v7] mm: Add PM_THP_MAPPED to /proc/pid/pagemap
On 23.11.21 01:01, Mina Almasry wrote:
> Add PM_THP_MAPPED MAPPING to allow userspace to detect whether a given virt
> address is currently mapped by a transparent huge page or not. Example
> use case is a process requesting THPs from the kernel (via a huge tmpfs
> mount for example), for a performance critical region of memory. The
> userspace may want to query whether the kernel is actually backing this
> memory by hugepages or not.
>
> PM_THP_MAPPED bit is set if the virt address is mapped at the PMD
> level and the underlying page is a transparent huge page.
>
> A few options were considered:
> 1. Add /proc/pid/pageflags that exports the same info as
> /proc/kpageflags. This is not appropriate because many kpageflags are
> inappropriate to expose to userspace processes.
> 2. Simply get this info from the existing /proc/pid/smaps interface.
> There are a couple of issues with that:
> 1. /proc/pid/smaps output is human readable and unfriendly to
> programatically parse.
> 2. /proc/pid/smaps is slow because it must read the whole memory range
> rather than a small range we care about. The cost of reading
> /proc/pid/smaps into userspace buffers is about ~800us per call,
> and this doesn't include parsing the output to get the information
> you need. The cost of querying 1 virt address in /proc/pid/pagemaps
> however is around 5-7us.
>
> Tested manually by adding logging into transhuge-stress, and by
> allocating THP and querying the PM_THP_MAPPED flag at those
> virtual addresses.
>
> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Matthew Wilcox <willy@...radead.org>
> Cc: David Rientjes rientjes@...gle.com
> Cc: Paul E. McKenney <paulmckrcu@...com>
> Cc: Yu Zhao <yuzhao@...gle.com>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Peter Xu <peterx@...hat.com>
> Cc: Ivan Teterevkov <ivan.teterevkov@...anix.com>
> Cc: Florian Schmidt <florian.schmidt@...anix.com>
> Cc: linux-kernel@...r.kernel.org
> Cc: linux-fsdevel@...r.kernel.org
> Cc: linux-mm@...ck.org
>
>
> ---
>
> Changes in v7:
> - Added clarification that smaps is only slow because it looks at the
> whole address space.
>
> Changes in v6:
> - Renamed to PM_THP_MAPPED
> - Removed changes to transhuge-stress
>
> Changes in v5:
> - Added justification for this interface in the commit message!
>
> Changes in v4:
> - Removed unnecessary moving of flags variable declaration
>
> Changes in v3:
> - Renamed PM_THP to PM_HUGE_THP_MAPPING
> - Fixed checks to set PM_HUGE_THP_MAPPING
> - Added PM_HUGE_THP_MAPPING docs
> ---
> Documentation/admin-guide/mm/pagemap.rst | 3 ++-
> fs/proc/task_mmu.c | 3 +++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index fdc19fbc10839..8a0f0064ff336 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -23,7 +23,8 @@ There are four components to pagemap:
> * Bit 56 page exclusively mapped (since 4.2)
> * Bit 57 pte is uffd-wp write-protected (since 5.13) (see
> :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
> - * Bits 57-60 zero
> + * Bit 58 page is a huge (PMD size) THP mapping
> + * Bits 59-60 zero
> * Bit 61 page is file-page or shared-anon (since 3.5)
> * Bit 62 page swapped
> * Bit 63 page present
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index ad667dbc96f5c..d784a97aa209a 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1302,6 +1302,7 @@ struct pagemapread {
> #define PM_SOFT_DIRTY BIT_ULL(55)
> #define PM_MMAP_EXCLUSIVE BIT_ULL(56)
> #define PM_UFFD_WP BIT_ULL(57)
> +#define PM_THP_MAPPED BIT_ULL(58)
> #define PM_FILE BIT_ULL(61)
> #define PM_SWAP BIT_ULL(62)
> #define PM_PRESENT BIT_ULL(63)
> @@ -1456,6 +1457,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>
> if (page && page_mapcount(page) == 1)
> flags |= PM_MMAP_EXCLUSIVE;
> + if (page && is_transparent_hugepage(page))
> + flags |= PM_THP_MAPPED;
>
> for (; addr != end; addr += PAGE_SIZE) {
> pagemap_entry_t pme = make_pme(frame, flags);
>
Thanks!
Reviewed-by: David Hildenbrand <david@...hat.com>
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists