[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bCZEKsocuwN4Na1+YyviERztGdGDoQgWhxQF-9WxVVW5Q@mail.gmail.com>
Date: Thu, 13 Apr 2023 11:05:20 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
mike.kravetz@...cle.com, muchun.song@...ux.dev,
rientjes@...gle.com, souravpanda@...gle.com
Subject: Re: [PATCH v2] mm: hugetlb_vmemmap: provide stronger vmemmap
allocation guarantees
On Wed, Apr 12, 2023 at 4:18 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Wed 12-04-23 13:13:02, Andrew Morton wrote:
> > Lots of questions (ie, missing information!)
> >
> > On Wed, 12 Apr 2023 19:59:39 +0000 Pasha Tatashin <pasha.tatashin@...een.com> wrote:
> >
> > > HugeTLB pages have a struct page optimizations where struct pages for tail
> > > pages are freed. However, when HugeTLB pages are destroyed, the memory for
> > > struct pages (vmemmap) need to be allocated again.
> > >
> > > Currently, __GFP_NORETRY flag is used to allocate the memory for vmemmap,
> > > but given that this flag makes very little effort to actually reclaim
> > > memory the returning of huge pages back to the system can be problem.
> >
> > Are there any reports of this happening in the real world?
> >
> > > Lets
> > > use __GFP_RETRY_MAYFAIL instead. This flag is also performs graceful
> > > reclaim without causing ooms, but at least it may perform a few retries,
> > > and will fail only when there is genuinely little amount of unused memory
> > > in the system.
> >
> > If so, does this change help?
> >
> > If the allocation attempt fails, what are the consequences?
> >
> > What are the potential downsides to this change? Why did we choose
> > __GFP_NORETRY in the first place?
> >
> > What happens if we try harder (eg, GFP_KERNEL)?
>
> Mike was generous enough to make me remember
> https://lore.kernel.org/linux-mm/YCafit5ruRJ+SL8I@dhcp22.suse.cz/.
> GFP_KERNEL wouldn't make much difference becauset this is
> __GFP_THISNODE. But I do agree that the changelog should go into more
> details about why do we want to try harder now. I can imagine that
> shrinking hugetlb pool by a large amount of hugetlb pages might become a
> problem but is this really happening or is this a theoretical concern?
This is a theoretical concern. Freeing a 1G page requires 16M of free
memory. A machine might need to be reconfigured from one task to
another, and release a large number of 1G pages back to the system if
allocating 16M fails, the release won't work.
In an ideal scenario we should guarantee that this never fails: that
we always can free HugeTLB pages back to the system. At the very least
we could steal the memory for vmemmap from the page that is being
released.
Pasha
Powered by blists - more mailing lists