[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+G9fYudT63yZrkWG+mfKHTcn5mP+Ay6hraEQy3G_4jufztrrA@mail.gmail.com>
Date: Fri, 10 Jul 2020 23:18:03 +0530
From: Naresh Kamboju <naresh.kamboju@...aro.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux- stable <stable@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, Arnd Bergmann <arnd@...db.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
lkft-triage@...ts.linaro.org, Chris Down <chris@...isdown.name>,
Michel Lespinasse <walken@...gle.com>,
Fan Yang <Fan_Yang@...u.edu.cn>,
Brian Geffon <bgeffon@...gle.com>,
Anshuman Khandual <anshuman.khandual@....com>,
Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>, pugaowei@...il.com,
Jerome Glisse <jglisse@...hat.com>,
Joel Fernandes <joel@...lfernandes.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Hugh Dickins <hughd@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>, Tejun Heo <tj@...nel.org>,
Sasha Levin <sashal@...nel.org>
Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386
On Fri, 10 Jul 2020 at 10:55, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Thu, Jul 9, 2020 at 9:29 PM Naresh Kamboju <naresh.kamboju@...aro.org> wrote:
> >
> > Your patch applied and re-tested.
> > warning triggered 10 times.
> >
> > old: bfe00000-c0000000 new: bfa00000 (val: 7d530067)
>
> Hmm.. It's not even the overlapping case, it's literally just "move
> exactly 2MB of page tables exactly one pmd down". Which should be the
> nice efficient case where we can do it without modifying the lower
> page tables at all, we just move the PMD entry.
>
> There shouldn't be anything in the new address space from bfa00000-bfdfffff.
>
> That PMD value obviously says differently, but it looks like a nice
> normal PMD value, nothing bad there.
>
> I'm starting to think that the issue might be that this is because the
> stack segment is special. Not only does it have the growsdown flag,
> but that whole thing has the magic guard page logic.
>
> So I wonder if we have installed a guard page _just_ below the old
> stack, so that we have populated that pmd because of that.
>
> We used to have an _actual_ guard page and then play nasty games with
> vm_start logic. We've gotten rid of that, though, and now we have that
> "stack_guard_gap" logic that _should_ mean that vm_start is always
> exact and proper (and that pgtbales_free() should have emptied it, but
> maybe we have some case we forgot about.
>
> > [ 741.511684] WARNING: CPU: 1 PID: 15173 at mm/mremap.c:211 move_page_tables.cold+0x0/0x2b
> > [ 741.598159] Call Trace:
> > [ 741.600694] setup_arg_pages+0x22b/0x310
> > [ 741.621687] load_elf_binary+0x31e/0x10f0
> > [ 741.633839] __do_execve_file+0x5a8/0xbf0
> > [ 741.637893] __ia32_sys_execve+0x2a/0x40
> > [ 741.641875] do_syscall_32_irqs_on+0x3d/0x2c0
> > [ 741.657660] do_fast_syscall_32+0x60/0xf0
> > [ 741.661691] do_SYSENTER_32+0x15/0x20
> > [ 741.665373] entry_SYSENTER_32+0x9f/0xf2
> > [ 741.734151] old: bfe00000-c0000000 new: bfa00000 (val: 7d530067)
>
> Nothing looks bad, and the ELF loading phase memory map should be
> really quite simple.
>
> The only half-way unusual thing is that you have basically exactly 2MB
> of stack at execve time (easy enough to tune by just setting argv/env
> right), and it's moved down by exactly 2MB.
>
> And that latter thing is just due to randomization, see
> arch_align_stack() in arch/x86/kernel/process.c.
>
> So that would explain why it doesn't happen every time.
>
> What happens if you apply the attached patch to *always* force the 2MB
> shift (rather than moving the stack by a random amount), and then run
> the other program (t.c -> compiled to "a.out").
I have applied your patch and test started in a loop for a million times
but the test ran for 35 times. Seems like the test got a timeout after 1 hour.
kernel messages printed while testing a.out
a.out (480) used greatest stack depth: 4872 bytes left
On other device
kworker/dying (172) used greatest stack depth: 5044 bytes left
Re-running test with long timeouts 4 hours and will share findings.
ref:
https://lkft.validation.linaro.org/scheduler/job/1555132#L1515
- Naresh
Powered by blists - more mailing lists