[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210604162402.iclcdd3ywynkoamy@box.shutemov.name>
Date: Fri, 4 Jun 2021 19:24:02 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Yang Shi <shy828301@...il.com>,
Wang Yugui <wangyugui@...-tech.com>,
Matthew Wilcox <willy@...radead.org>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Alistair Popple <apopple@...dia.com>,
Ralph Campbell <rcampbell@...dia.com>, Zi Yan <ziy@...dia.com>,
Miaohe Lin <linmiaohe@...wei.com>,
Minchan Kim <minchan@...nel.org>, Jue Wang <juew@...gle.com>,
Peter Xu <peterx@...hat.com>, Jan Kara <jack@...e.cz>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page
mapped by ptes
On Tue, Jun 01, 2021 at 02:13:21PM -0700, Hugh Dickins wrote:
> Running certain tests with a DEBUG_VM kernel would crash within hours,
> on the total_mapcount BUG() in split_huge_page_to_list(), while trying
> to free up some memory by punching a hole in a shmem huge page: split's
> try_to_unmap() was unable to find all the mappings of the page (which,
> on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
>
> Crash dumps showed two tail pages of a shmem huge page remained mapped
> by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
> of a long unmapped range; and no page table had yet been allocated for
> the head of the huge page to be mapped into.
>
> Although designed to handle these odd misaligned huge-page-mapped-by-pte
> cases, page_vma_mapped_walk() falls short by returning false prematurely
> when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
> are cases when a huge page may span the boundary, with ptes present in
> the next.
Oh. My bad. I guess it was pain to debug.
> Restructure page_vma_mapped_walk() as a loop to continue in these cases,
> while keeping its layout much as before. Add a step_forward() helper to
> advance pvmw->address across those boundaries: originally I tried to use
> mm's standard p?d_addr_end() macros, but hit the same crash 512 times
> less often: because of the way redundant levels are folded together,
> but folded differently in different configurations, it was just too
> difficult to use them correctly; and step_forward() is simpler anyway.
>
> Merged various other minor fixes and cleanups into page_vma_mapped_walk()
> as I worked on it: which I find much easier to enumerate here than to
> prise apart into separate commits.
But it makes it harder to review...
> Handle all of the hugetlbfs PageHuge() case once at the start,
> so we don't need to worry about it again further down.
>
> Sometimes local copy of pvmw->page was used, sometimes pvmw->page:
> just use pvmw->page throughout (and continue to use pvmw->address
> throughout, though we could take a local copy instead).
>
> Use pmd_read_atomic() with barrier() instead of READ_ONCE() for pmde:
> some architectures (e.g. i386 with PAE) have a multi-word pmd entry,
> for which READ_ONCE() is not good enough.
>
> Re-evaluate pmde after taking lock, then use it in subsequent tests,
> instead of repeatedly dereferencing pvmw->pmd pointer.
>
> Rearrange the !pmd_present block to follow the same "return not_found,
> return not_found, return true" pattern as the block above it (note:
> returning not_found there is never premature, since the existence or
> prior existence of a huge pmd guarantees good alignment).
>
> Adjust page table boundary test in case address was not page-aligned.
>
> Reset pvmw->pte to NULL after unmapping that page table.
>
> Respect the 80-column line limit.
>
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@...gle.com>
> Cc: <stable@...r.kernel.org>
I tried to review it and superficially it looks good, but it has to be
split into bunch of patches.
> /* when pud is not present, pte will be NULL */
> - pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> + pvmw->pte = huge_pte_offset(mm, pvmw->address,
> + page_size(pvmw->page));
AFAICS, it exactly fits into 80-column.
> if (!pvmw->pte)
> return false;
>
> - pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte);
> + pvmw->ptl = huge_pte_lockptr(page_hstate(pvmw->page),
> + mm, pvmw->pte);
And this one end on 79.
Hm?
--
Kirill A. Shutemov
Powered by blists - more mailing lists