[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+YnZ2FhpmxNNmzGmmOQGqrsZShStgg4n_AYx41cUVnXeQ@mail.gmail.com>
Date: Fri, 10 Feb 2017 11:24:45 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Andrey Ryabinin <aryabinin@...tuozzo.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
"x86@...nel.org" <x86@...nel.org>,
Tobias Regnery <tobias.regnery@...il.com>,
"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
Alexander Potapenko <glider@...gle.com>,
kasan-dev <kasan-dev@...glegroups.com>,
LKML <linux-kernel@...r.kernel.org>,
stable <stable@...r.kernel.org>
Subject: Re: [PATCH] x86/mm/ptdump: Fix soft lockup in page table walker.
On Fri, Feb 10, 2017 at 10:54 AM, Andrey Ryabinin
<aryabinin@...tuozzo.com> wrote:
> CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow.
> In that case ptdump_walk_pgd_level_core() takes a lot of time to
> walk across all page tables and doing this without
> a rescheduling causes soft lockups:
>
> NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1]
> ...
> Call Trace:
> ptdump_walk_pgd_level_core+0x40c/0x550
> ptdump_walk_pgd_level_checkwx+0x17/0x20
> mark_rodata_ro+0x13b/0x150
> kernel_init+0x2f/0x120
> ret_from_fork+0x2c/0x40
>
> I guess that this issue might arise even without KASAN on a huge
> machines with several terabytes of RAM.
>
> Stick cond_resched() in pgd loop to fix this.
>
> Reported-by: Tobias Regnery <tobias.regnery@...il.com>
> Signed-off-by: Andrey Ryabinin <aryabinin@...tuozzo.com>
> Cc: <stable@...r.kernel.org>
> ---
> arch/x86/mm/dump_pagetables.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
> index ea9c49a..8aa6bea 100644
> --- a/arch/x86/mm/dump_pagetables.c
> +++ b/arch/x86/mm/dump_pagetables.c
> @@ -15,6 +15,7 @@
> #include <linux/debugfs.h>
> #include <linux/mm.h>
> #include <linux/init.h>
> +#include <linux/sched.h>
> #include <linux/seq_file.h>
>
> #include <asm/pgtable.h>
> @@ -406,6 +407,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
> } else
> note_page(m, &st, __pgprot(0), 1);
>
> + cond_resched();
This is the right thing to do per se, but I am concerned that now
people will just suffers from slow boot (it can take literally
minutes) and will not realize the root cause nor that it's fixable
(e.g. with rodata=n) and will probably just blame KASAN for slowness.
Could we default this rodata check to n under KASAN? Or at least print
some explanatory warning message before doing marking rodata (it
should be printed right before "hang", so if you stare at it for a
minute during each boot you realize that it may be related)? Or
something along these lines. FWIW in my builds I just always disable
the check.
Powered by blists - more mailing lists