lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Nov 2020 17:37:09 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Jonathan Corbet <corbet@....net>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org,
        Peter Zijlstra <peterz@...radead.org>, viro@...iv.linux.org.uk,
        Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
        mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
        Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
        anshuman.khandual@....com, jroedel@...e.de,
        Mina Almasry <almasrymina@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Oscar Salvador <osalvador@...e.de>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        linux-doc@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v5 11/21] mm/hugetlb: Allocate the vmemmap
 pages associated with each hugetlb page

On Fri, Nov 20, 2020 at 5:28 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Fri 20-11-20 16:51:59, Muchun Song wrote:
> > On Fri, Nov 20, 2020 at 4:11 PM Michal Hocko <mhocko@...e.com> wrote:
> > >
> > > On Fri 20-11-20 14:43:15, Muchun Song wrote:
> > > [...]
> > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > > > index eda7e3a0b67c..361c4174e222 100644
> > > > --- a/mm/hugetlb_vmemmap.c
> > > > +++ b/mm/hugetlb_vmemmap.c
> > > > @@ -117,6 +117,8 @@
> > > >  #define RESERVE_VMEMMAP_NR           2U
> > > >  #define RESERVE_VMEMMAP_SIZE         (RESERVE_VMEMMAP_NR << PAGE_SHIFT)
> > > >  #define TAIL_PAGE_REUSE                      -1
> > > > +#define GFP_VMEMMAP_PAGE             \
> > > > +     (GFP_KERNEL | __GFP_NOFAIL | __GFP_MEMALLOC)
> > >
> > > This is really dangerous! __GFP_MEMALLOC would allow a complete memory
> > > depletion. I am not even sure triggering the OOM killer is a reasonable
> > > behavior. It is just unexpected that shrinking a hugetlb pool can have
> > > destructive side effects. I believe it would be more reasonable to
> > > simply refuse to shrink the pool if we cannot free those pages up. This
> > > sucks as well but it isn't destructive at least.
> >
> > I find the instructions of __GFP_MEMALLOC from the kernel doc.
> >
> > %__GFP_MEMALLOC allows access to all memory. This should only be used when
> > the caller guarantees the allocation will allow more memory to be freed
> > very shortly.
> >
> > Our situation is in line with the description above. We will free a HugeTLB page
> > to the buddy allocator which is much larger than that we allocated shortly.
>
> Yes that is a part of the description. But read it in its full entirety.
>  * %__GFP_MEMALLOC allows access to all memory. This should only be used when
>  * the caller guarantees the allocation will allow more memory to be freed
>  * very shortly e.g. process exiting or swapping. Users either should
>  * be the MM or co-ordinating closely with the VM (e.g. swap over NFS).
>  * Users of this flag have to be extremely careful to not deplete the reserve
>  * completely and implement a throttling mechanism which controls the
>  * consumption of the reserve based on the amount of freed memory.
>  * Usage of a pre-allocated pool (e.g. mempool) should be always considered
>  * before using this flag.
>
> GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_HIGH

We want to free the HugeTLB page to the buddy allocator, but before that,
we need to allocate some pages as vmemmap pages, so here we cannot
handle allocation failures. I think that we should replace the
__GFP_RETRY_MAYFAIL to __GFP_NOFAIL.

GFP_KERNEL | __GFP_NOFAIL | __GFP_HIGH

This meets our needs here. Thanks.

>
> sounds like a more reasonable fit to me.
>
> --
> Michal Hocko
> SUSE Labs



-- 
Yours,
Muchun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ