lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <88af8545-14b7-08de-f121-e12295d5d5b9@oracle.com>
Date:   Wed, 18 Nov 2020 15:48:21 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Muchun Song <songmuchun@...edance.com>, corbet@....net,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
        hpa@...or.com, dave.hansen@...ux.intel.com, luto@...nel.org,
        peterz@...radead.org, viro@...iv.linux.org.uk,
        akpm@...ux-foundation.org, paulmck@...nel.org,
        mchehab+huawei@...nel.org, pawan.kumar.gupta@...ux.intel.com,
        rdunlap@...radead.org, oneukum@...e.com, anshuman.khandual@....com,
        jroedel@...e.de, almasrymina@...gle.com, rientjes@...gle.com,
        willy@...radead.org, osalvador@...e.de, mhocko@...e.com
Cc:     duanxiongchun@...edance.com, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v4 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in
 the struct hstate

On 11/13/20 2:59 AM, Muchun Song wrote:
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> new file mode 100644
> index 000000000000..a6c9948302e2
> --- /dev/null
> +++ b/mm/hugetlb_vmemmap.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Free some vmemmap pages of HugeTLB
> + *
> + * Copyright (c) 2020, Bytedance. All rights reserved.
> + *
> + *     Author: Muchun Song <songmuchun@...edance.com>
> + *

Oscar has already made some suggestions to change comments.  I would suggest
changing the below text to something like the following.

> + * Nowadays we track the status of physical page frames using struct page
> + * structures arranged in one or more arrays. And here exists one-to-one
> + * mapping between the physical page frame and the corresponding struct page
> + * structure.
> + *
> + * The HugeTLB support is built on top of multiple page size support that
> + * is provided by most modern architectures. For example, x86 CPUs normally
> + * support 4K and 2M (1G if architecturally supported) page sizes. Every
> + * HugeTLB has more than one struct page structure. The 2M HugeTLB has 512
> + * struct page structure and 1G HugeTLB has 4096 struct page structures. But
> + * in the core of HugeTLB only uses the first 4 (Use of first 4 struct page
> + * structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page structures to
> + * store metadata associated with each HugeTLB. The rest of the struct page
> + * structures are usually read the compound_head field which are all the same
> + * value. If we can free some struct page memory to buddy system so that we
> + * can save a lot of memory.
> + *

struct page structures (page structs) are used to describe a physical page
frame.  By default, there is a one-to-one mapping from a page frame to
it's corresponding page struct.

HugeTLB pages consist of multiple base page size pages and is supported by
many architectures. See hugetlbpage.rst in the Documentation directory for
more details.  On the x86 architecture, HugeTLB pages of size 2MB and 1GB
are currently supported.  Since the base page size on x86 is 4KB, a 2MB
HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of
4096 base pages.  For each base page, there is a corresponding page struct.

Within the HugeTLB subsystem, only the first 4 page structs are used to
contain unique information about a HugeTLB page.  HUGETLB_CGROUP_MIN_ORDER
provides this upper limit.  The only 'useful' information in the remaining
page structs is the compound_head field, and this field is the same for all
tail pages.

By removing redundant page structs for HugeTLB pages, memory can returned
to the buddy allocator for other uses.

> + * When the system boot up, every 2M HugeTLB has 512 struct page structures
> + * which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE).
> + *
> + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> + * |           |                     |     0     | -------------> |     0     |
> + * |           |                     |     1     | -------------> |     1     |
> + * |           |                     |     2     | -------------> |     2     |
> + * |           |                     |     3     | -------------> |     3     |
> + * |           |                     |     4     | -------------> |     4     |
> + * |     2M    |                     |     5     | -------------> |     5     |
> + * |           |                     |     6     | -------------> |     6     |
> + * |           |                     |     7     | -------------> |     7     |
> + * |           |                     +-----------+                +-----------+
> + * |           |
> + * |           |
> + * +-----------+
> + *
> + *

I think we want the description before the next diagram.

Reworded description here:

The value of compound_head is the same for all tail pages.  The first page of
page structs (page 0) associated with the HugeTLB page contains the 4 page
structs necessary to describe the HugeTLB.  The only use of the remaining pages
of page structs (page 1 to page 7) is to point to compound_head.  Therefore,
we can remap pages 2 to 7 to page 1.  Only 2 pages of page structs will be used
for each HugeTLB page.  This will allow us to free the remaining 6 pages to 
the buddy allocator.  

Here is how things look after remapping.

> + *
> + *    HugeTLB                  struct pages(8 pages)         page frame(8 pages)
> + * +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+
> + * |           |                     |     0     | -------------> |     0     |
> + * |           |                     |     1     | -------------> |     1     |
> + * |           |                     |     2     | -------------> +-----------+
> + * |           |                     |     3     | -----------------^ ^ ^ ^ ^
> + * |           |                     |     4     | -------------------+ | | |
> + * |     2M    |                     |     5     | ---------------------+ | |
> + * |           |                     |     6     | -----------------------+ |
> + * |           |                     |     7     | -------------------------+
> + * |           |                     +-----------+
> + * |           |
> + * |           |
> + * +-----------+

-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ