linux-kernel - Re: [PATCH v2] mm: hugetlb_vmemmap: provide stronger vmemmap allocation guarantees

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e7171928-61aa-2897-b3d1-e5f826a4592c@google.com>
Date:   Fri, 14 Apr 2023 17:47:28 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
cc:     Pasha Tatashin <pasha.tatashin@...een.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        mike.kravetz@...cle.com, muchun.song@...ux.dev,
        souravpanda@...gle.com
Subject: Re: [PATCH v2] mm: hugetlb_vmemmap: provide stronger vmemmap allocation
 guarantees

On Thu, 13 Apr 2023, Michal Hocko wrote:

> [...]
> > > > This is a theoretical concern. Freeing a 1G page requires 16M of free
> > > > memory. A machine might need to be reconfigured from one task to
> > > > another, and release a large number of 1G pages back to the system if
> > > > allocating 16M fails, the release won't work.
> > >
> > > This is really an important "detail" changelog should mention. While I
> > > am not really against that change I would much rather see that as a
> > > result of a real world fix rather than a theoretical concern. Mostly
> > > because a real life scenario would allow us to test the
> > > __GFP_RETRY_MAYFAIL effectivness. As that request might fail as well we
> > > just end up with a theoretical fix for a theoretical problem. Something
> > > that is easy to introduce but much harder to get rid of should we ever
> > > need to change __GFP_RETRY_MAYFAIL implementation for example.
> > 
> > I will add this to changelog in v3. If  __GFP_RETRY_MAYFAIL is
> > ineffective we will receive feedback once someone hits this problem.
> 
> I do not remember anybody hitting this with the current __GFP_NORETRY.
> So arguably there is nothing to be fixed ATM.
> 

I think we should still at least clear __GFP_NORETRY in this allocation: 
to be able to free 1GB hugepages back to the system we'd like the page 
allocator to at least exercise its normal order-0 allocation logic rather 
than exempting it from retrying reclaim by opting into __GFP_NORETRY.

I'd agree with the analysis in 
https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@dhcp22.suse.cz/ that 
either a cleared __GFP_NORETRY or a __GFP_RETRY_MAYFAIL makes logical 
sense.

We really *do* want to free these hugepages back to the system and the 
amount of memory freeing will always be more than the allocation for 
struct page.  The net result is more free memory.

If the allocation fails, we can't free 1GB back to the system on a 
saturated node if our first reclaim attempt didn't allow these struct 
pages to be allocated.  Stranding 1GB in the hugetlb pool that no 
userspace on the system can make use of at the time isn't very useful.