lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Dec 2020 19:25:15 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     Jonathan Corbet <corbet@....net>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de, x86@...nel.org, hpa@...or.com,
        dave.hansen@...ux.intel.com, luto@...nel.org,
        Peter Zijlstra <peterz@...radead.org>, viro@...iv.linux.org.uk,
        Andrew Morton <akpm@...ux-foundation.org>, paulmck@...nel.org,
        mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
        Randy Dunlap <rdunlap@...radead.org>, oneukum@...e.com,
        anshuman.khandual@....com, jroedel@...e.de,
        Mina Almasry <almasrymina@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...e.com>,
        "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
        David Hildenbrand <david@...hat.com>, naoya.horiguchi@....com,
        Xiongchun duan <duanxiongchun@...edance.com>,
        linux-doc@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [External] Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap
 pages associated with each HugeTLB page

On Mon, Dec 21, 2020 at 5:11 PM Oscar Salvador <osalvador@...e.de> wrote:
>
> On Thu, Dec 17, 2020 at 08:12:55PM +0800, Muchun Song wrote:
> > +static inline void free_bootmem_page(struct page *page)
> > +{
> > +     unsigned long magic = (unsigned long)page->freelist;
> > +
> > +     /*
> > +      * The reserve_bootmem_region sets the reserved flag on bootmem
> > +      * pages.
> > +      */
> > +     VM_WARN_ON(page_ref_count(page) != 2);
> > +
> > +     if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> > +             put_page_bootmem(page);
> > +     else
> > +             VM_WARN_ON(1);
>
> Ideally, I think we want to see what how the page looks since its state
> is not what we expected, so maybe join both conditions and use dump_page().

Agree. Will do. Thanks.

>
> > + * By removing redundant page structs for HugeTLB pages, memory can returned to
>                                                                      ^^ be

Thanks.

> > + * the buddy allocator for other uses.
>
> [...]
>
> > +void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> > +{
> > +     unsigned long vmemmap_addr = (unsigned long)head;
> > +
> > +     if (!free_vmemmap_pages_per_hpage(h))
> > +             return;
> > +
> > +     vmemmap_remap_free(vmemmap_addr + RESERVE_VMEMMAP_SIZE,
> > +                        free_vmemmap_pages_size_per_hpage(h));
>
> I am not sure what others think, but I would like to see vmemmap_remap_free taking
> three arguments: start, end, and reuse addr, e.g:
>
>  void free_huge_page_vmemmap(struct hstate *h, struct page *head)
>  {
>       unsigned long vmemmap_addr = (unsigned long)head;
>       unsigned long vmemmap_end, vmemmap_reuse;
>
>       if (!free_vmemmap_pages_per_hpage(h))
>               return;
>
>       vmemmap_addr += RESERVE_MEMMAP_SIZE;
>       vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
>       vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
>
>       vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
>  }
>
> The reason for me to do this is to let the callers of vmemmap_remap_free decide
> __what__ they want to remap.
>
> More on this below.
>
>
> > +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> > +                           unsigned long end,
> > +                           struct vmemmap_remap_walk *walk)
> > +{
> > +     pte_t *pte;
> > +
> > +     pte = pte_offset_kernel(pmd, addr);
> > +
> > +     if (walk->reuse_addr == addr) {
> > +             BUG_ON(pte_none(*pte));
> > +             walk->reuse_page = pte_page(*pte++);
> > +             addr += PAGE_SIZE;
> > +     }
>
> Although it is quite obvious, a brief comment here pointing out what are we
> doing and that this is meant to be set only once would be nice.

OK. Will do.

>
>
> > +static void vmemmap_remap_range(unsigned long start, unsigned long end,
> > +                             struct vmemmap_remap_walk *walk)
> > +{
> > +     unsigned long addr = start - PAGE_SIZE;
> > +     unsigned long next;
> > +     pgd_t *pgd;
> > +
> > +     VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> > +     VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> > +
> > +     walk->reuse_page = NULL;
> > +     walk->reuse_addr = addr;
>
> With the change I suggested above, struct vmemmap_remap_walk should be
> initialitzed at once in vmemmap_remap_free, so this should not longer be needed.

You are right.

> (And btw, you do not need to set reuse_page to NULL, the way you init the struct
> in vmemmap_remap_free makes sure to null any field you do not explicitly set).
>
>
> > +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> > +                           struct vmemmap_remap_walk *walk)
> > +{
> > +     /*
> > +      * Make the tail pages are mapped with read-only to catch
> > +      * illegal write operation to the tail pages.
>         "Remap the tail pages as read-only to ..."

Thanks.

>
> > +      */
> > +     pgprot_t pgprot = PAGE_KERNEL_RO;
> > +     pte_t entry = mk_pte(walk->reuse_page, pgprot);
> > +     struct page *page;
> > +
> > +     page = pte_page(*pte);
>
>  struct page *page = pte_page(*pte);
>
> since you did the same for the other two.

Yeah. Will change to this.

>
> > +     list_add(&page->lru, walk->vmemmap_pages);
> > +
> > +     set_pte_at(&init_mm, addr, pte, entry);
> > +}
> > +
> > +/**
> > + * vmemmap_remap_free - remap the vmemmap virtual address range
> > + *                      [start, start + size) to the page which
> > + *                      [start - PAGE_SIZE, start) is mapped,
> > + *                      then free vmemmap pages.
> > + * @start:   start address of the vmemmap virtual address range
> > + * @size:    size of the vmemmap virtual address range
> > + */
> > +void vmemmap_remap_free(unsigned long start, unsigned long size)
> > +{
> > +     unsigned long end = start + size;
> > +     LIST_HEAD(vmemmap_pages);
> > +
> > +     struct vmemmap_remap_walk walk = {
> > +             .remap_pte      = vmemmap_remap_pte,
> > +             .vmemmap_pages  = &vmemmap_pages,
> > +     };
>
> As stated above, this would become:
>
>  void vmemmap_remap_free(unsigned long start, unsigned long end,
>                          usigned long reuse)
>  {
>        LIST_HEAD(vmemmap_pages);
>        struct vmemmap_remap_walk walk = {
>                .reuse_addr = reuse,
>                .remap_pte = vmemmap_remap_pte,
>                .vmemmap_pages = &vmemmap_pages,
>        };
>
>   You might have had your reasons to do this way, but this looks more natural
>   to me, with the plus that callers of vmemmap_remap_free can specify
>   what they want to remap.

Should we add a BUG_ON in vmemmap_remap_free() for now?

        BUG_ON(reuse != start + PAGE_SIZE);

>
>
> --
> Oscar Salvador
> SUSE L3



-- 
Yours,
Muchun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ