lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200401200855.d23xcwznr5cm67p2@ca-dmjordan1.us.oracle.com>
Date:   Wed, 1 Apr 2020 16:08:55 -0400
From:   Daniel Jordan <daniel.m.jordan@...cle.com>
To:     Pavel Tatashin <pasha.tatashin@...een.com>
Cc:     linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        mhocko@...e.com, linux-mm@...ck.org, dan.j.williams@...el.com,
        shile.zhang@...ux.alibaba.com, daniel.m.jordan@...cle.com,
        ktkhai@...tuozzo.com, david@...hat.com, jmorris@...ei.org,
        sashal@...nel.org
Subject: Re: [PATCH] mm: initialize deferred pages with interrupts enabled

On Wed, Apr 01, 2020 at 04:00:27PM -0400, Daniel Jordan wrote:
> On Wed, Apr 01, 2020 at 03:32:38PM -0400, Pavel Tatashin wrote:
> > Initializing struct pages is a long task and keeping interrupts disabled
> > for the duration of this operation introduces a number of problems.
> > 
> > 1. jiffies are not updated for long period of time, and thus incorrect time
> >    is reported. See proposed solution and discussion here:
> >    lkml/20200311123848.118638-1-shile.zhang@...ux.alibaba.com
> > 2. It prevents farther improving deferred page initialization by allowing
> 
>                                                                    not allowing
> >    inter-node multi-threading.
> 
>      intra-node
> 
> ...
> > After:
> > [    1.632580] node 0 initialised, 12051227 pages in 436ms
> 
> Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages")
> Reported-by: Shile Zhang <shile.zhang@...ux.alibaba.com>
> 
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@...een.com>
> 
> Freezing jiffies for a while during boot sounds like stable to me, so
> 
> Cc: <stable@...r.kernel.org>    [4.17.x+]
> 
> 
> Can you please add a comment to mmzone.h above node_size_lock, something like
> 
>          * Must be held any time you expect node_start_pfn,
>          * node_present_pages, node_spanned_pages or nr_zones to stay constant.
> +        * Also synchronizes pgdat->first_deferred_pfn during deferred page
> +        * init.
>          ...
>         spinlock_t node_size_lock;
> 
> > @@ -1854,18 +1859,6 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
> >  		return false;
> >  
> >  	pgdat_resize_lock(pgdat, &flags);
> > -
> > -	/*
> > -	 * If deferred pages have been initialized while we were waiting for
> > -	 * the lock, return true, as the zone was grown.  The caller will retry
> > -	 * this zone.  We won't return to this function since the caller also
> > -	 * has this static branch.
> > -	 */
> > -	if (!static_branch_unlikely(&deferred_pages)) {
> > -		pgdat_resize_unlock(pgdat, &flags);
> > -		return true;
> > -	}
> > -
> 
> Huh, looks like this wasn't needed even before this change.
> 
> 
> The rest looks fine.
> 
> Reviewed-by: Daniel Jordan <daniel.m.jordan@...cle.com>

...except for I forgot about the touch_nmi_watchdog() calls.  I think you'd
need something kind of like this before your patch.

---8<---

From: Daniel Jordan <daniel.m.jordan@...cle.com>
Date: Fri, 27 Mar 2020 17:29:05 -0400
Subject: [PATCH] mm: call touch_nmi_watchdog() on max order boundaries in
 deferred init

deferred_init_memmap() disables interrupts the entire time, so it calls
touch_nmi_watchdog() periodically to avoid soft lockup splats.  Soon it
will run with interrupts enabled, at which point cond_resched() should
be used instead.

deferred_grow_zone() makes the same watchdog calls through code shared
with deferred init but will continue to run with interrupts disabled, so
it can't call cond_resched().

Pull the watchdog calls up to these two places to allow the first to be
changed later, independently of the second.  The frequency reduces from
twice per pageblock (init and free) to once per max order block.

Signed-off-by: Daniel Jordan <daniel.m.jordan@...cle.com>
---
 mm/page_alloc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 212734c4f8b0..4cf18c534233 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1639,7 +1639,6 @@ static void __init deferred_free_pages(unsigned long pfn,
 		} else if (!(pfn & nr_pgmask)) {
 			deferred_free_range(pfn - nr_free, nr_free);
 			nr_free = 1;
-			touch_nmi_watchdog();
 		} else {
 			nr_free++;
 		}
@@ -1669,7 +1668,6 @@ static unsigned long  __init deferred_init_pages(struct zone *zone,
 			continue;
 		} else if (!page || !(pfn & nr_pgmask)) {
 			page = pfn_to_page(pfn);
-			touch_nmi_watchdog();
 		} else {
 			page++;
 		}
@@ -1813,8 +1811,10 @@ static int __init deferred_init_memmap(void *data)
 	 * that we can avoid introducing any issues with the buddy
 	 * allocator.
 	 */
-	while (spfn < epfn)
+	while (spfn < epfn) {
 		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+		touch_nmi_watchdog();
+	}
 zone_empty:
 	pgdat_resize_unlock(pgdat, &flags);
 
@@ -1908,6 +1908,7 @@ deferred_grow_zone_locked(pg_data_t *pgdat, struct zone *zone,
 		first_deferred_pfn = spfn;
 
 		nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn);
+		touch_nmi_watchdog();
 
 		/* We should only stop along section boundaries */
 		if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION)
-- 
2.25.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ