linux-kernel - Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210211213823.GB3597@localhost.localdomain>
Date:   Thu, 11 Feb 2021 22:38:23 +0100
From:   Oscar Salvador <osalvador@...e.de>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Muchun Song <songmuchun@...edance.com>,
        Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle
 free hugetlb pages

On Wed, Feb 10, 2021 at 05:16:29PM -0800, Mike Kravetz wrote:
> Should probably check for -EBUSY as this means someone started using
> the page while we were allocating a new one.  It would complicate the
> code to try and do the 'right thing'.  Right thing might be getting
> dissolving the new pool page and then trying to isolate this in use
> page.  Of course things could change again while you are doing that. :(

Yeah, I kept the error handling rather low just be clear about the
approach I was leaning towards, but yes, we should definitely check
for -EBUSY on dissolve_free_huge_page().

And it might be that dissolve_free_huge_page() returns -EBUSY on the old
page, and we need to dissolve the freshly allocated one as it is not
going to be used, and that might fail as well due to reserves for
instance, or maybe someone started using it?
I have to confess that I need to check the reservation code closer to be
aware of corner cases.

We used to try to be clever in such situations in memory-failure code,
but at some point you end up thinking "ok, how many retries are
considered enough?".
That code was trickier as we were handling uncorrected/corrected memory
errors, so we strived to do our best, but here we can be more flexible
as the whole thing is racy per se, and just fail if we find too many
obstacles.

I shall resume work early next week. 

Thanks for the tips ;-)

-- 
Oscar Salvador
SUSE L3