lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Jul 2018 14:49:47 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Cannon Matthews <cannonmatthews@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Nadia Yvette Chambers <nyc@...omorphy.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        andreslc@...gle.com, pfeiner@...gle.com, dmatlack@...gle.com,
        gthelen@...gle.com
Subject: Re: [PATCH] mm: hugetlb: don't zero 1GiB bootmem pages.

On Tue 10-07-18 13:46:57, Mike Kravetz wrote:
> On 07/10/2018 11:49 AM, Cannon Matthews wrote:
> > When using 1GiB pages during early boot, use the new
> > memblock_virt_alloc_try_nid_raw() function to allocate memory without
> > zeroing it.  Zeroing out hundreds or thousands of GiB in a single core
> > memset() call is very slow, and can make early boot last upwards of
> > 20-30 minutes on multi TiB machines.
> > 
> > To be safe, still zero the first sizeof(struct boomem_huge_page) bytes
> > since this is used a temporary storage place for this info until
> > gather_bootmem_prealloc() processes them later.
> > 
> > The rest of the memory does not need to be zero'd as the hugetlb pages
> > are always zero'd on page fault.
> > 
> > Tested: Booted with ~3800 1G pages, and it booted successfully in
> > roughly the same amount of time as with 0, as opposed to the 25+
> > minutes it would take before.
> > 
> 
> Nice improvement!
> 
> > Signed-off-by: Cannon Matthews <cannonmatthews@...gle.com>
> > ---
> >  mm/hugetlb.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 3612fbb32e9d..c93a2c77e881 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2101,7 +2101,7 @@ int __alloc_bootmem_huge_page(struct hstate *h)
> >  	for_each_node_mask_to_alloc(h, nr_nodes, node, &node_states[N_MEMORY]) {
> >  		void *addr;
> > 
> > -		addr = memblock_virt_alloc_try_nid_nopanic(
> > +		addr = memblock_virt_alloc_try_nid_raw(
> >  				huge_page_size(h), huge_page_size(h),
> >  				0, BOOTMEM_ALLOC_ACCESSIBLE, node);
> >  		if (addr) {
> > @@ -2109,7 +2109,12 @@ int __alloc_bootmem_huge_page(struct hstate *h)
> >  			 * Use the beginning of the huge page to store the
> >  			 * huge_bootmem_page struct (until gather_bootmem
> >  			 * puts them into the mem_map).
> > +			 *
> > +			 * memblock_virt_alloc_try_nid_raw returns non-zero'd
> > +			 * memory so zero out just enough for this struct, the
> > +			 * rest will be zero'd on page fault.
> >  			 */
> > +			memset(addr, 0, sizeof(struct huge_bootmem_page));
> 
> This forced me to look at the usage of huge_bootmem_page.  It is defined as:
> struct huge_bootmem_page {
> 	struct list_head list;
> 	struct hstate *hstate;
> #ifdef CONFIG_HIGHMEM
> 	phys_addr_t phys;
> #endif
> };
> 
> The list and hstate fields are set immediately after allocating the memory
> block here and elsewhere.  However, I can't find any code that sets phys.
> Although, it is potentially used in gather_bootmem_prealloc().  It appears
> powerpc used this field at one time, but no longer does.
> 
> Am I missing something?

If yes, then I am missing it as well. phys is a cool name to grep for...
Anyway, does it really make any sense to allow gigantic pages on HIGHMEM
systems in the first place?

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ