lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210211213823.GB3597@localhost.localdomain>
Date:   Thu, 11 Feb 2021 22:38:23 +0100
From:   Oscar Salvador <osalvador@...e.de>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Muchun Song <songmuchun@...edance.com>,
        Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/2] mm,page_alloc: Make alloc_contig_range handle
 free hugetlb pages

On Wed, Feb 10, 2021 at 05:16:29PM -0800, Mike Kravetz wrote:
> Should probably check for -EBUSY as this means someone started using
> the page while we were allocating a new one.  It would complicate the
> code to try and do the 'right thing'.  Right thing might be getting
> dissolving the new pool page and then trying to isolate this in use
> page.  Of course things could change again while you are doing that. :(

Yeah, I kept the error handling rather low just be clear about the
approach I was leaning towards, but yes, we should definitely check
for -EBUSY on dissolve_free_huge_page().

And it might be that dissolve_free_huge_page() returns -EBUSY on the old
page, and we need to dissolve the freshly allocated one as it is not
going to be used, and that might fail as well due to reserves for
instance, or maybe someone started using it?
I have to confess that I need to check the reservation code closer to be
aware of corner cases.

We used to try to be clever in such situations in memory-failure code,
but at some point you end up thinking "ok, how many retries are
considered enough?".
That code was trickier as we were handling uncorrected/corrected memory
errors, so we strived to do our best, but here we can be more flexible
as the whole thing is racy per se, and just fail if we find too many
obstacles.

I shall resume work early next week. 

Thanks for the tips ;-)

-- 
Oscar Salvador
SUSE L3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ