lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 4 Mar 2020 10:34:03 +0800 From: Shile Zhang <shile.zhang@...ux.alibaba.com> To: Kirill Tkhai <ktkhai@...tuozzo.com>, Andrew Morton <akpm@...ux-foundation.org>, Pavel Tatashin <pasha.tatashin@...een.com> Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...e.com> Subject: Re: [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap() Hi Kirill, Thanks for your quickly reply! On 2020/3/4 00:52, Kirill Tkhai wrote: > On 03.03.2020 19:15, Shile Zhang wrote: >> When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will >> initialise the deferred pages with local interrupts disabled. It is >> introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while >> initializing deferred pages"). >> >> The local interrupt will be disabled long time inside >> deferred_init_memmap(), depends on memory size. >> On machine with NCPUS <= 2, the 'pgdatinit' kthread could be pined on >> boot CPU, then the tick timer will stuck long time, which caused the >> system wall time inaccuracy. >> >> For example, the dmesg shown that: >> >> [ 0.197975] node 0 initialised, 32170688 pages in 1ms >> >> Obviously, 1ms is unreasonable. >> Now, fix it by restore in the pending interrupts inside the while loop. >> The reasonable demsg shown likes: >> >> [ 1.069306] node 0 initialised, 32203456 pages in 894ms > The way I understand the original problem, that Pavel fixed: > > we need disable irqs in deferred_init_memmap() since this function may be called > in parallel with deferred_grow_zone() called from interrupt handler. So, Pavel > added lock to fix the race. > > In case of we temporary unlock the lock, interrupt still be possible, > so my previous proposition returns the problem back. > > Now thought again, I think we have to just add: > > pgdat_resize_unlock(); > pgdat_resize_lock(); > > instead of releasing interrupts, since in case of we just release them with lock held, > a call of interrupt->deferred_grow_zone() bring us to a deadlock. > > So, unlock the lock is must. Yes, you're right! I missed this point. Thanks for your comment! > >> Signed-off-by: Shile Zhang <shile.zhang@...ux.alibaba.com> >> --- >> mm/page_alloc.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3c4eb750a199..d3f337f2e089 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1809,8 +1809,12 @@ static int __init deferred_init_memmap(void *data) >> * that we can avoid introducing any issues with the buddy >> * allocator. >> */ >> - while (spfn < epfn) >> + while (spfn < epfn) { >> nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); >> + /* let in any pending interrupts */ >> + local_irq_restore(flags); >> + local_irq_save(flags); >> + } >> zone_empty: >> pgdat_resize_unlock(pgdat, &flags); > I think we need here something like below (untested): > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 79e950d76ffc..323afa9a4db5 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1828,7 +1828,7 @@ static int __init deferred_init_memmap(void *data) > { > pg_data_t *pgdat = data; > const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); > - unsigned long spfn = 0, epfn = 0, nr_pages = 0; > + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0; > unsigned long first_init_pfn, flags; > unsigned long start = jiffies; > struct zone *zone; > @@ -1869,8 +1869,18 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + /* > + * Release interrupts every 1Gb to give a possibility > + * a timer to advance jiffies. > + */ > + if (nr_pages - prev_nr_pages > (1UL << (30 - PAGE_SHIFT))) { > + prev_nr_pages = nr_pages; > + pgdat_resize_unlock(pgdat, &flags); > + pgdat_resize_lock(pgdat, &flags); > + } > + } > zone_empty: > pgdat_resize_unlock(pgdat, &flags); > > > (I believe the comment may be improved more). Yeah, your patch is better! I test your code and it works! But it seems that 1G is still hold the interrupts too long, about 40ms in my env with Intel(R) Xeon(R) 2.5GHz). I tried other size, it is OK to use 1024 pages (4MB), which suggested by Andrew's before. Could you please help to review it again? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3c4eb750a199..5def66d3ffcd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1768,7 +1768,7 @@ static int __init deferred_init_memmap(void *data) { pg_data_t *pgdat = data; const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - unsigned long spfn = 0, epfn = 0, nr_pages = 0; + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0; unsigned long first_init_pfn, flags; unsigned long start = jiffies; struct zone *zone; @@ -1809,8 +1809,17 @@ static int __init deferred_init_memmap(void *data) * that we can avoid introducing any issues with the buddy * allocator. */ - while (spfn < epfn) + while (spfn < epfn) { nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + /* + * Restore pending interrupts every 1024 pages to give + * the chance tick timer to advance jiffies. + */ + if (nr_pages - prev_nr_pages > 1024) { + pgdat_resize_unlock(&flags); + pgdat_resize_lock(&flags); + } + } zone_empty: pgdat_resize_unlock(pgdat, &flags);
Powered by blists - more mailing lists