lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 4 Jun 2021 19:24:02 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Yang Shi <shy828301@...il.com>,
        Wang Yugui <wangyugui@...-tech.com>,
        Matthew Wilcox <willy@...radead.org>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        Alistair Popple <apopple@...dia.com>,
        Ralph Campbell <rcampbell@...dia.com>, Zi Yan <ziy@...dia.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Minchan Kim <minchan@...nel.org>, Jue Wang <juew@...gle.com>,
        Peter Xu <peterx@...hat.com>, Jan Kara <jack@...e.cz>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/7] mm/thp: fix page_vma_mapped_walk() if huge page
 mapped by ptes

On Tue, Jun 01, 2021 at 02:13:21PM -0700, Hugh Dickins wrote:
> Running certain tests with a DEBUG_VM kernel would crash within hours,
> on the total_mapcount BUG() in split_huge_page_to_list(), while trying
> to free up some memory by punching a hole in a shmem huge page: split's
> try_to_unmap() was unable to find all the mappings of the page (which,
> on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
> 
> Crash dumps showed two tail pages of a shmem huge page remained mapped
> by pte: ptes in a non-huge-aligned vma of a gVisor process, at the end
> of a long unmapped range; and no page table had yet been allocated for
> the head of the huge page to be mapped into.
> 
> Although designed to handle these odd misaligned huge-page-mapped-by-pte
> cases, page_vma_mapped_walk() falls short by returning false prematurely
> when !pmd_present or !pud_present or !p4d_present or !pgd_present: there
> are cases when a huge page may span the boundary, with ptes present in
> the next.

Oh. My bad. I guess it was pain to debug.

> Restructure page_vma_mapped_walk() as a loop to continue in these cases,
> while keeping its layout much as before. Add a step_forward() helper to
> advance pvmw->address across those boundaries: originally I tried to use
> mm's standard p?d_addr_end() macros, but hit the same crash 512 times
> less often: because of the way redundant levels are folded together,
> but folded differently in different configurations, it was just too
> difficult to use them correctly; and step_forward() is simpler anyway.
> 
> Merged various other minor fixes and cleanups into page_vma_mapped_walk()
> as I worked on it: which I find much easier to enumerate here than to
> prise apart into separate commits.

But it makes it harder to review...

> Handle all of the hugetlbfs PageHuge() case once at the start,
> so we don't need to worry about it again further down.
> 
> Sometimes local copy of pvmw->page was used, sometimes pvmw->page:
> just use pvmw->page throughout (and continue to use pvmw->address
> throughout, though we could take a local copy instead).
> 
> Use pmd_read_atomic() with barrier() instead of READ_ONCE() for pmde:
> some architectures (e.g. i386 with PAE) have a multi-word pmd entry,
> for which READ_ONCE() is not good enough.
> 
> Re-evaluate pmde after taking lock, then use it in subsequent tests,
> instead of repeatedly dereferencing pvmw->pmd pointer.
> 
> Rearrange the !pmd_present block to follow the same "return not_found,
> return not_found, return true" pattern as the block above it (note:
> returning not_found there is never premature, since the existence or
> prior existence of a huge pmd guarantees good alignment).
> 
> Adjust page table boundary test in case address was not page-aligned.
> 
> Reset pvmw->pte to NULL after unmapping that page table.
> 
> Respect the 80-column line limit.
> 
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Signed-off-by: Hugh Dickins <hughd@...gle.com>
> Cc: <stable@...r.kernel.org>

I tried to review it and superficially it looks good, but it has to be
split into bunch of patches.


>  		/* when pud is not present, pte will be NULL */
> -		pvmw->pte = huge_pte_offset(mm, pvmw->address, page_size(page));
> +		pvmw->pte = huge_pte_offset(mm, pvmw->address,
> +					    page_size(pvmw->page));

AFAICS, it exactly fits into 80-column.

>  		if (!pvmw->pte)
>  			return false;
>  
> -		pvmw->ptl = huge_pte_lockptr(page_hstate(page), mm, pvmw->pte);
> +		pvmw->ptl = huge_pte_lockptr(page_hstate(pvmw->page),
> +					     mm, pvmw->pte);

And this one end on 79.

Hm?

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ