[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACjP9X8xXMXwbku7yvpGV72XGZ2ZeAcEi4vtFVe_rf9+QZ19XA@mail.gmail.com>
Date: Thu, 15 Mar 2018 15:53:22 +0100
From: Daniel Vacek <neelx@...hat.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Naresh Kamboju <naresh.kamboju@...aro.org>,
Sudeep Holla <sudeep.holla@....com>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
open list <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Paul Burton <paul.burton@...tec.com>,
Pavel Tatashin <pasha.tatashin@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>,
stable <stable@...r.kernel.org>
Subject: Re: [PATCH] mm/page_alloc: fix boot hang in memmap_init_zone
On Thu, Mar 15, 2018 at 12:50 PM, Michal Hocko <mhocko@...nel.org> wrote:
> On Thu 15-03-18 02:30:41, Daniel Vacek wrote:
>> On Wed, Mar 14, 2018 at 3:17 PM, Michal Hocko <mhocko@...nel.org> wrote:
>> > On Tue 13-03-18 23:42:40, Daniel Vacek wrote:
>> >> On some architectures (reported on arm64) commit 864b75f9d6b01 ("mm/page_alloc: fix memmap_init_zone pageblock alignment")
>> >> causes a boot hang. This patch fixes the hang making sure the alignment
>> >> never steps back.
>> >
>> > I am sorry to be complaining again, but the code is so obscure that I
>>
>> No worries, I'm glad for any review. Which code exactly you do find
>> obscure? This patch or my former fix or the original commit
>> introducing memblock_next_valid_pfn()? Coz I'd agree the original
>> commit looks pretty obscure...
>
> As mentioned in the other email, the whole going back and forth in the
> same loop is just too ugly to live.
It's not really supposed to go back, but I guess you understand.
>> > would _really_ appreciate some more information about what is going
>> > on here. memblock_next_valid_pfn will most likely return a pfn within
>> > the same memblock and the alignment will move it before the old pfn
>> > which is not valid - so the block has some holes. Is that correct?
>>
>> I do not understand what you mean by 'pfn within the same memblock'?
>
> Sorry, I should have said in the same pageblock
>
>> And by 'the block has some holes'?
>
> memblock_next_valid_pfn clearly returns pfn which is within a pageblock
> and that is why we do not initialize pages in the begining of the block
> while move_freepages_block does really expect the full pageblock to be
> initialized properly. That is the fundamental problem, right?
Yes, that's correct.
>> memblock has types 'memory' (as usable memory) and 'reserved' (for
>> unusable mem), if I understand correctly.
>
> We might not have struct pages for invalid pfns. That really depends on
> the memory mode. Sure sparse mem model will usually allocate struct
> pages for whole memory sections but that is not universally true and
> adding such a suble assumption is simply wrong.
This is gray area for me. But if I understand correctly this
assumption comes from the code. It was already there and got broken
hence I was trying to keep it. If anything needs redesigning I'm all
for it. But I was just calming the fire here. I only didn't test on
arm, which seems to be the only one different.
> I suspect you are making strong assumptions based on a very specific
> implementation which might be not true in general. That was the feeling
> I've had since the patch was proposed for the first time. This is such a
> cluttered area that I am not really sure myself, thoug.
I understand. And again this is likely correct. I'll be glad for any
assistance here. My limited knowledge is the primary cause for lack of
relevant details I guess. What I checked looks like pfn_valid is a
generic function used by all arches but arm, which seems to be the
only one to implement CONFIG_HAVE_ARCH_PFN_VALID if I didn't miss
anything. So if this config is enabled on arm, it uses it's own
version of pfn_valid(). If not, I'd expect all arches behave the same.
That's where my assumption comes from.
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists