lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYuCaNXikls/9JhS@t490s>
Date:   Wed, 10 Nov 2021 16:27:20 +0800
From:   Peter Xu <peterx@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Mina Almasry <almasrymina@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        "Paul E . McKenney" <paulmckrcu@...com>,
        Yu Zhao <yuzhao@...gle.com>, Jonathan Corbet <corbet@....net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ivan Teterevkov <ivan.teterevkov@...anix.com>,
        Florian Schmidt <florian.schmidt@...anix.com>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v4] mm: Add PM_HUGE_THP_MAPPING to /proc/pid/pagemap

On Wed, Nov 10, 2021 at 09:14:42AM +0100, David Hildenbrand wrote:
> On 10.11.21 08:03, Peter Xu wrote:
> > Hi, Mina,
> > 
> > Sorry to comment late.
> > 
> > On Sun, Nov 07, 2021 at 03:57:54PM -0800, Mina Almasry wrote:
> >> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> >> index fdc19fbc10839..8a0f0064ff336 100644
> >> --- a/Documentation/admin-guide/mm/pagemap.rst
> >> +++ b/Documentation/admin-guide/mm/pagemap.rst
> >> @@ -23,7 +23,8 @@ There are four components to pagemap:
> >>      * Bit  56    page exclusively mapped (since 4.2)
> >>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
> >>        :ref:`Documentation/admin-guide/mm/userfaultfd.rst <userfaultfd>`)
> >> -    * Bits 57-60 zero
> >> +    * Bit  58    page is a huge (PMD size) THP mapping
> >> +    * Bits 59-60 zero
> >>      * Bit  61    page is file-page or shared-anon (since 3.5)
> >>      * Bit  62    page swapped
> >>      * Bit  63    page present
> >> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> >> index ad667dbc96f5c..6f1403f83b310 100644
> >> --- a/fs/proc/task_mmu.c
> >> +++ b/fs/proc/task_mmu.c
> >> @@ -1302,6 +1302,7 @@ struct pagemapread {
> >>  #define PM_SOFT_DIRTY		BIT_ULL(55)
> >>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
> >>  #define PM_UFFD_WP		BIT_ULL(57)
> >> +#define PM_HUGE_THP_MAPPING	BIT_ULL(58)
> > 
> > The ending "_MAPPING" seems redundant to me, how about just call it "PM_THP" or
> > "PM_HUGE" (as THP also means HUGE already)?
> > 
> > IMHO the core problem is about permission controls, and it seems to me we're
> > actually trying to workaround it by duplicating some information we have.. so
> > it's kind of a pity.  Totally not against this patch, but imho it'll be nicer
> > if it's the permission part that to be enhanced, rather than a new but slightly
> > duplicated interface.
> 
> It's not a permission problem AFAIKS: even with permissions "changed",
> any attempt to use /proc/kpageflags is just racy. Let's not go down that
> path, it's really the wrong mechanism to export to random userspace.

I agree it's racy, but IMHO that's fine.  These are hints for userspace to make
decisions, they cannot be always right.  Even if we fetch atomically and seeing
that this pte is swapped out, it can be quickly accessed at the same time and
it'll be in-memory again.  Only if we can freeze the whole pgtable but we
can't, so they can only be used as hints.

> 
> We do have an interface to access this information from userspace
> already: /proc/self/smaps IIRC. Mina commented that they are seeing
> performance issues with that approach.
> 
> It would be valuable to add these details to the patch description,
> including a performance difference when using both interfaces we have
> available. As the patch description stands, there is no explanation
> "why" we want this change.

I didn't notice Mina mention about performance issues with kpageflags, if so
then I agree this solution helps.  I doubt the performance is an issue, though,
as THP info shouldn't be something changing rapidly so it should be some hint
to do sanity checks only (e.g., to make sure no unwanted split of THP
happening, but the scanning should not require to be super fast; it could be
done with a relatively long scanning period).  If there's a performance
concern, yes it would be great to mention it too in the commit message.

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ