lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 7 Jun 2021 20:55:57 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Yang Shi <shy828301@...il.com>
Cc:     Zi Yan <ziy@...dia.com>, nao.horiguchi@...il.com,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: mempolicy: don't have to split pmd for huge zero page

On Mon 07-06-21 10:00:01, Yang Shi wrote:
> On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Fri 04-06-21 13:35:13, Yang Shi wrote:
> > > When trying to migrate pages to obey mempolicy, the huge zero page is
> > > split then the page table walk at PTE level just skips zero page.  So it
> > > seems pointless to split huge zero page, it could be just skipped like
> > > base zero page.
> >
> > My THP knowledge is not the best but this is incorrect AIACS. Huge zero
> > page is not split. We do split the pmd which is mapping the said page. I
> > suspect you refer to vm_normal_page when talking about a zero page but
> > please be aware that huge zero page is not a normal zero page. It is
> > allocated dynamically (see get_huge_zero_page).
> 
> For a normal huge page, yes, split_huge_pmd() just splits pmd. But
> actually the base zero pfn will be inserted to PTEs when splitting
> huge zero pmd. Please check __split_huge_zero_page_pmd() out.

My bad. I didn't have a look all the way down there. The naming
suggested that this is purely page table operations and I have suspected
that ptes just point to the offset of the THP.

But I am obviously wrong here. Sorry about that.

> I should make this point clearer in the commit log. Sorry for the confusion.
> 
> >
> > So in the end you patch disables mbind of zero pages to a target node
> > and that is a regression.
> 
> Do we really migrate zero page? IIUC zero page is just skipped by
> vm_normal_page() check in queue_pages_pte_range(), isn't it?

Yeah, normal zero pages are skipped indeed. I haven't studied why this
is the case yet. It surely sounds a bit suspicious because this is an
explicit request to migrate memory and if the zero page is misplaced it
should be moved. On the hand this would increase RSS so maybe this is
the point.

> > Have you tested the patch?
> 
> No, just build test. I thought this change was straightforward.
> 
> >
> > > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for
> > > this case.
> >
> > Btw. this changelog is missing a problem statement. I suspect there is
> > no actual problem that it should fix and it is likely driven by reading
> > the code. Right?
> 
> The actual problem is it is pointless to split a huge zero pmd. Yes,
> it is driven by visual inspection.

Is there any actual workload that cares? This is quite a subtle area so
I would be careful to do changes just because...
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ