lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20201124131436.GX27488@dhcp22.suse.cz>
Date:   Tue, 24 Nov 2020 14:14:36 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     Jonathan Corbet <corbet@....net>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org,
        Peter Zijlstra <peterz@...radead.org>, viro@...iv.linux.org.uk,
        Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
        mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
        Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
        anshuman.khandual@....com, jroedel@...e.de,
        Mina Almasry <almasrymina@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Oscar Salvador <osalvador@...e.de>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        linux-doc@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v6 09/16] mm/hugetlb: Defer freeing of
 HugeTLB pages

On Tue 24-11-20 20:45:30, Muchun Song wrote:
> On Tue, Nov 24, 2020 at 7:51 PM Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Tue 24-11-20 17:52:52, Muchun Song wrote:
> > > In the subsequent patch, we will allocate the vmemmap pages when free
> > > HugeTLB pages. But update_and_free_page() is called from a non-task
> > > context(and hold hugetlb_lock), so we can defer the actual freeing in
> > > a workqueue to prevent use GFP_ATOMIC to allocate the vmemmap pages.
> >
> > This has been brought up earlier without any satisfying answer. Do we
> > really have bother with the freeing from the pool and reconstructing the
> > vmemmap page tables? Do existing usecases really require such a dynamic
> > behavior? In other words, wouldn't it be much simpler to allow to use
> 
> If someone wants to free a HugeTLB page, there is no way to do that if we
> do not allow this behavior.

Right. The question is how much that matters for the _initial_ feature
submission. Is this restriction so important that it would render it
unsuable?

> When do we need this? On our server, we will
> allocate a lot of HugeTLB pages for SPDK or virtualization. Sometimes,
> we want to debug some issues and want to apt install some debug tools,
> but if the host has little memory and the install operation can be failed
> because of no memory. In this time, we can try to free some HugeTLB
> pages to buddy in order to continue debugging. So maybe we need this.

Or maybe you can still allocate hugetlb pages for debugging in runtime
and try to free those when you need to.

> > hugetlb pages with sparse vmemmaps only for the boot time reservations
> > and never allow them to be freed back to the allocator. This is pretty
> > restrictive, no question about that, but it would drop quite some code
> 
> Yeah, if we do not allow freeing the HugeTLB page to buddy, it actually
> can drop some code. But I think that it only drop this one and next one
> patch. It seems not a lot. And if we drop this patch, we need to add some
> another code to do the boot time reservations and other code to disallow
> freeing HugeTLB pages.

you need a per hugetlb page flag to note the sparse vmemmap anyway so
the freeing path should be a trivial check for the flag. Early boot
reservation. Special casing for the early boot reservation shouldn't be
that hard either. But I haven't checked closely.

> So why not support freeing now.

Because it adds some non trivial challenges which would be better to
deal with with a stable and tested and feature limited implementation.
The most obvious one is the problem with vmemmap allocations when
freeing hugetlb page. Others like vmemmap manipulation is quite some
code but no surprises. Btw. that should be implemented in vmemmap proper
and ready for other potential users. But this is a minor detail.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ