linux-kernel - Re: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after page_alloc_init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170724130613.GK25221@dhcp22.suse.cz>
Date:   Mon, 24 Jul 2017 15:06:13 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Yang Shi <yang.shi@...aro.org>,
        Laura Abbott <labbott@...hat.com>,
        Vinayak Menon <vinmenon@...eaurora.org>,
        zhong jiang <zhongjiang@...wei.com>
Subject: Re: [RFC PATCH 4/4] mm, page_ext: move page_ext_init() after
 page_alloc_init_late()

On Thu 20-07-17 15:40:29, Vlastimil Babka wrote:
> Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are
> initialized") has avoided a a NULL pointer dereference due to
> DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling page_ext_init()
> only after the deferred struct page init has finished. Later commit
> fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") avoided the
> underlying issue differently and moved the page_ext_init() call back to where
> it was before.
> 
> However, there are two problems with the current code:
> - on very large machines, page_ext_init() may fail to allocate the page_ext
> structures, because deferred struct page init hasn't yet started, and the
> pre-inited part might be too small.
> This has been observed with a 3TB machine with page_owner=on. Although it
> was an older kernel where page_owner hasn't yet been converted to stack depot,
> thus page_ext was larger, the fundamental problem is still in mainline.

I was about to suggest using memblock/bootmem allocator but it seems
that page_ext_init is called passed mm_init. Is there any specific
reason why we cannot do the per-section initialization along with the
rest of the memory section init code which should have an early
allocator available?

> - page_owner's init_pages_in_zone() is called before deferred struct page init
> has started, so it will encounter unitialized struct pages. This currently
> happens to cause no harm, because the memmap array is are pre-zeroed on
> allocation and thus the "if (page_zone(page) != zone)" check is negative, but
> that pre-zeroing guarantee might change soon.

Yes this is annoying and the bug IMHO. We shouldn't consider spanned
pages but rather the maximum valid pfn for the zone. The rest simply
cannot by used by anybody so there shouldn't be any page_ext work due.
Or am I missing something?
-- 
Michal Hocko
SUSE Labs