[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whxP0Gj70pJN5R7Qec4qjrGr+G9Ex7FJi7=_fPcdQ2ocQ@mail.gmail.com>
Date: Sun, 12 Jul 2020 15:58:06 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: Naresh Kamboju <naresh.kamboju@...aro.org>,
linux- stable <stable@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>, Arnd Bergmann <arnd@...db.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
lkft-triage@...ts.linaro.org, Chris Down <chris@...isdown.name>,
Michel Lespinasse <walken@...gle.com>,
Fan Yang <Fan_Yang@...u.edu.cn>,
Brian Geffon <bgeffon@...gle.com>,
Anshuman Khandual <anshuman.khandual@....com>,
Will Deacon <will@...nel.org>,
Catalin Marinas <catalin.marinas@....com>, pugaowei@...il.com,
Jerome Glisse <jglisse@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Hugh Dickins <hughd@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>, Tejun Heo <tj@...nel.org>,
Sasha Levin <sashal@...nel.org>
Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386
On Sun, Jul 12, 2020 at 2:50 PM Joel Fernandes <joel@...lfernandes.org> wrote:
>
> I reproduced Naresh's issue on a 32-bit x86 machine and the below patch fixes it.
> The issue is solely within execve() itself and the way it allocates/copies the
> temporary stack.
>
> It is actually indeed an overlapping case because the length of the
> stack is big enough to cause overlap. The VMA grows quite a bit because of
> all the page faults that happen due to the copy of the args/env. Then during
> the move of overlapped region, it finds that a PMD is already allocated.
Oh, ok, I think I see.
So the issue is that while move_normal_pmd() _itself_ will be clearing
the old pmd entries when it copies them, the 4kB copies that happened
_before_ this time will not have cleared the old pmd that they were
in.
So we've had a deeper stack, and we've already copied some of it one
page at a time, and we're done with those 4kB entries, but now we hit
a 2MB-aligned case and want to move that down. But when it does that,
it hits the (by now hopefully empty) pmd that used to contain the 4kB
entries we copied earlier.
So we've cleared the page table, but we've simply never called
pgtable_clear() here, because the page table got cleared in
move_ptes() when it did that
pte = ptep_get_and_clear(mm, old_addr, old_pte);
on the previous old pte entries.
That makes sense to me. Does that match what you see? Because when I
said it wasn't overlapped, I was really talking about that one single
pmd move itself not being overlapped.
> The below patch fixes it and is not warning anymore in 30 minutes of testing
> so far.
So I'm not hugely happy with the patch, I have to admit.
Why?
Because:
> + /* Ensure the temporary stack is shifted by atleast its size */
> + if (stack_shift < (vma->vm_end - vma->vm_start))
> + stack_shift = (vma->vm_end - vma->vm_start);
This basically overrides the random stack shift done by arch_align_stack().
Yeah, yeah, arch_align_stack() is kind of misnamed. It _used_ to do
what the name implies it does, which on x86 is to just give the
minimum ABI-specified 16-byte stack alignment.
But these days, what it really does is say "align the stack pointer,
but also shift it down by a random amount within 8kB so that the start
of the stack isn't some repeatable thing that an attacker can
trivially control with the right argv/envp size.."
I don't think it works right anyway because we then PAGE_ALIGN() it,
but whatever.
But it looks like what you're doing means that now the size of the
stack will override that shift, and that means that now the
randomization is entirely undone. No?
Plus you do it after we've also checked that we haven't grown the
stack down to below mmap_min_addr().
But maybe I misread that patch.
But I do feel like you figured out why the bug happened, now we're
just discussing whether the patch is the right thing to do.
Maybe saying "doing the pmd copies for the initial stack isn't
important, so let's just note this as a special case and get rid of
the WARN_ON()" might be an alternative solution.
The reason I worried was that I felt like we didn't understand why the
WARN_ON() happened. Now that I do, I think we could just say "ok,
don't warn, we know that this can happen, and it's harmless".
Anybody else have any opinions?
Linus
Powered by blists - more mailing lists