linux-kernel - Re: [criu] 1M guard page ruined restore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170622142300.GA762@redhat.com>
Date:   Thu, 22 Jun 2017 16:23:00 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Cyrill Gorcunov <gorcunov@...il.com>
Cc:     Hugh Dickins <hughd@...gle.com>, Andrey Vagin <avagin@...nvz.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Dmitry Safonov <dsafonov@...tuozzo.com>,
        Andrew Morton <akpm@...uxfoundation.org>,
        Adrian Reber <areber@...hat.com>
Subject: Re: [criu] 1M guard page ruined restore

Cyrill,

I am replying to my own email because I got lost in numerous threads/emails
connected to stack guard/gap problems. IIRC you confirmed that the 1st load
doesn't fail and the patch fixes the problem. So everything is clear, and we
will discuss this change in another thread.

But let me add that (imo) you should not change this test-case. You simply
should not run it if kerndat_mm_guard_page_maps() detects the new kernel at
startup.

The new version makes no sense for criu, afaics. Yes, yes, thank you very
much for this test-case, it found the kernel regression ;) But criu has
nothing to do with this problem, and it is not clear right now if we are
going to fix it or not.

With the recent kernel changes criu should never look outside of start-end
region reported by /proc/maps; and restore doesn't even need to know if a
GROWSDOWN region will actually grow or not, because (iiuc) you do not need
to auto-grow the stack vma during restore, criu re-creates the whole vma
with the same length using MAP_FIXED and it should never write below the
addr returned by mmap(MAP_FIXED).

So (afaics) the only complication is that the process can be dumped on
a system running with (say) stack_guard_gap=4K kernel parameter, and then
restored on another system running with stack_guard_gap=1M. In this case
the application may fail after restore if it tries to auto-grow the stack,
but this is unlikely and this is another story.

Oleg.

On 06/21, Oleg Nesterov wrote:
>
> On 06/21, Cyrill Gorcunov wrote:
> >
> > On Wed, Jun 21, 2017 at 05:57:30PM +0200, Oleg Nesterov wrote:
> > > >
> > > > 	p = fake_grow_down;
> > > > 	*p-- = 'c';
> > >
> > > I guess this works? I mean, *p-- = 'c' should not fail...
> >
> > It fails.
>
> Hmm. Impossible ;) could you add the additional printf's to re-check?
>
> > Here is the complete code. It supposed to _extend_ stack but it fails
> > on the latest master + Hugh's [PATCH] mm: fix new crash in unmapped_area_topdown()
> > ---
> > [root@fc2 criu]# ~/st2
> > start_addr 7fe6162a8000
> > start_addr 7fe6163d9000
> > Segmentation fault (core dumped)
> > ---
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <errno.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <unistd.h>
> >
> > #include <sys/mman.h>
> >
> > #define PAGE_SIZE 4096
> >
> > int main(int argc, char **argv)
> > {
> > 	char *start_addr, *start_addr1, *fake_grow_down, *test_addr, *grow_down;
> > 	volatile char *p;
> >
> > 	start_addr = mmap(NULL, PAGE_SIZE * 512, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> > 	if (start_addr == MAP_FAILED) {
> > 		printf("Can't mal a new region");
> > 		return 1;
> > 	}
> > 	printf("start_addr %lx\n", start_addr);
> > 	munmap(start_addr, PAGE_SIZE * 512);
> >
> > 	start_addr += PAGE_SIZE * 300;
> >
> > 	fake_grow_down = mmap(start_addr + PAGE_SIZE * 5, PAGE_SIZE,
> > 			 PROT_READ | PROT_WRITE,
> > 			 MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED | MAP_GROWSDOWN, -1, 0);
> > 	if (fake_grow_down == MAP_FAILED) {
> > 		printf("Can't mal a new region");
> > 		return 1;
> > 	}
> > 	printf("start_addr %lx\n", fake_grow_down);
> >
> > 	p = fake_grow_down;
> > 	*p-- = 'c';
>
> once again, I can't believe this STORE can fail...
>
> > 	*p = 'b';
>
> Ah. I forgot about another kernel "feature" ;) not related to the recent guard
> page changes...
>
> Could you test the patch below?
>
> Oleg.
>
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 8ad91a0..edc5d68 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1416,7 +1416,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code,
>  		 * and pusha to work. ("enter $65535, $31" pushes
>  		 * 32 pointers and then decrements %sp by 65535.)
>  		 */
> -		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
> +if (0)		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
>  			bad_area(regs, error_code, address);
>  			return;
>  		}