[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkooYAi=Hb0=oJ+2b6G=h5Sx4jnyo5L0nPYjDcBqBHnfug@mail.gmail.com>
Date: Mon, 7 Jun 2021 15:02:39 -0700
From: Yang Shi <shy828301@...il.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Zi Yan <ziy@...dia.com>, nao.horiguchi@...il.com,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: mempolicy: don't have to split pmd for huge zero page
On Mon, Jun 7, 2021 at 11:55 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Mon 07-06-21 10:00:01, Yang Shi wrote:
> > On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko <mhocko@...e.com> wrote:
> > >
> > > On Fri 04-06-21 13:35:13, Yang Shi wrote:
> > > > When trying to migrate pages to obey mempolicy, the huge zero page is
> > > > split then the page table walk at PTE level just skips zero page. So it
> > > > seems pointless to split huge zero page, it could be just skipped like
> > > > base zero page.
> > >
> > > My THP knowledge is not the best but this is incorrect AIACS. Huge zero
> > > page is not split. We do split the pmd which is mapping the said page. I
> > > suspect you refer to vm_normal_page when talking about a zero page but
> > > please be aware that huge zero page is not a normal zero page. It is
> > > allocated dynamically (see get_huge_zero_page).
> >
> > For a normal huge page, yes, split_huge_pmd() just splits pmd. But
> > actually the base zero pfn will be inserted to PTEs when splitting
> > huge zero pmd. Please check __split_huge_zero_page_pmd() out.
>
> My bad. I didn't have a look all the way down there. The naming
> suggested that this is purely page table operations and I have suspected
> that ptes just point to the offset of the THP.
>
> But I am obviously wrong here. Sorry about that.
>
> > I should make this point clearer in the commit log. Sorry for the confusion.
> >
> > >
> > > So in the end you patch disables mbind of zero pages to a target node
> > > and that is a regression.
> >
> > Do we really migrate zero page? IIUC zero page is just skipped by
> > vm_normal_page() check in queue_pages_pte_range(), isn't it?
>
> Yeah, normal zero pages are skipped indeed. I haven't studied why this
> is the case yet. It surely sounds a bit suspicious because this is an
> explicit request to migrate memory and if the zero page is misplaced it
> should be moved. On the hand this would increase RSS so maybe this is
> the point.
The zero page is a global shared page, I don't think "misplace"
applies to it. It doesn't make too much sense to migrate a shared
page. Actually there is page mapcount check in migrate_page_add() to
skip shared normal pages as well.
>
> > > Have you tested the patch?
> >
> > No, just build test. I thought this change was straightforward.
> >
> > >
> > > > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for
> > > > this case.
> > >
> > > Btw. this changelog is missing a problem statement. I suspect there is
> > > no actual problem that it should fix and it is likely driven by reading
> > > the code. Right?
> >
> > The actual problem is it is pointless to split a huge zero pmd. Yes,
> > it is driven by visual inspection.
>
> Is there any actual workload that cares? This is quite a subtle area so
> I would be careful to do changes just because...
I'm not sure whether there is measurable improvement for actual
workloads, but I believe this change does eliminate some unnecessary
work.
I think the test shown in the previous email gives us some confidence
that the change doesn't have regression.
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists