[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ede47fd-d01e-6d9c-288a-2ec97b5392af@redhat.com>
Date: Thu, 4 Mar 2021 11:32:29 +0100
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Muchun Song <songmuchun@...edance.com>,
Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/2] mm: Make alloc_contig_range handle free hugetlb
pages
>> I think we should not swallo such return values in
>> isolate_or_dissolve_huge_page() and instead properly report esp. -ENOMEM
>> properly up this callchain now. Otherwise we'll end up retrying / reporting
>> -EBUSY, which is misleading.
>
> I am not sure I follow you here.
> So, atm, alloc_and_dissolve_huge_page can either generate -ENOMEM or
> -EBUSY wrt. error codes.
> -ENOMEM when we cannot allocate a page, and -EBUSY when we raced with
> someone.
> You mean to only report ENOMEM down the chain?
Yes, fatal errors.
>
>> From isolate_migratepages_range()/isolate_migratepages_block() we'll keep
>> reporting "pfn > 0".
>>
>> a) In isolate_migratepages_range() we'll keep iterating over pageblocks
>> although we should just fail with -ENOMEM right away.
>>
>> b) In __alloc_contig_migrate_range() we'll keep retrying up to 5 times
>> although we should just fail with -ENOMEM. We end up returning "-EBUSY"
>> after retrying.
>>
>> c) In alloc_contig_range() we'll continue trying to isolate although we
>> should just return -ENOMEM.
>
> Yes, "fatal" errors get masked, and hence we treat everything as "things
> are busy, let us try again", and this is rather unforunate.
>
>> I think we have should start returning proper errors from
>> isolate_migratepages_range()/isolate_migratepages_block() on critical issues
>> (-EINTR, -ENOMEM) instead of going via "!pfn vs. pfn" and retrying on "pfn".
>>
>> So we should then fail with -ENOMEM during isolate_migratepages_range()
>> cleanly, just as we would do when we get -ENOMEM during migrate_pages().
>
> I guess we could rework the interface and make isolate_migratepages_range and
> isolate_migratepages_block to report the right thing.
> I yet have to check that this does not mess up a lot with the compaction
> interface.
>
> But overall I agree with your point here, and I am willing to to tackle
> this.
>
> The question is whether we want to do this as part of this series, or
> after this series gets picked up.
> IMHO, we could do it after, as a follow-up, unless you feel strong about
> it.
>
> What do you think?
I think this is now the second fatal error we can have (-EINTR,
-ENOMEM), thus the current interface (return "NULL" on fatal errros) no
longer works properly.
No strong opinion about fixing this up on top - could be it's cleaner to
just do it right away.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists