lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Jun 2017 03:23:20 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Cyrill Gorcunov <gorcunov@...il.com>
cc:     Hugh Dickins <hughd@...gle.com>, Andrey Vagin <avagin@...nvz.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Dmitry Safonov <dsafonov@...tuozzo.com>,
        Andrew Morton <akpm@...uxfoundation.org>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [criu] 1M guard page ruined restore

On Tue, 20 Jun 2017, Cyrill Gorcunov wrote:

> Hi Hugh! We're running our tests on latest vanilla kernel all the time,
> and recently we've got an issue on restore:
> 
> https://github.com/xemul/criu/issues/322
> 
>  | (00.410614)      4: cg: Cgroups 1 inherited from parent
>  | (00.410858)      4: Opened local page read 3 (parent 0)
>  | (00.410961)      4:     premap 0x00000000400000-0x00000000406000 -> 00007fe65badf000
>  | (00.410981)      4:     premap 0x00000000605000-0x00000000606000 -> 00007fe65bae5000
>  | (00.410997)      4:     premap 0x00000000606000-0x00000000607000 -> 00007fe65bae6000
>  | (00.411013)      4:     premap 0x000000025a0000-0x000000025c1000 -> 00007fe65bae7000
>  | (00.411036)      4: Error (criu/mem.c:726): Unable to remap a private vma: Invalid argument
>  | (00.412779)      1: Error (criu/cr-restore.c:1465): 4 exited, status=1
> 
> Andrew has narrowed it down to the commit
> 
>  | commit 1be7107fbe18eed3e319a6c3e83c78254b693acb
>  | Author: Hugh Dickins <hughd@...gle.com>
>  | Date:   Mon Jun 19 04:03:24 2017 -0700
>  | 
>  |     mm: larger stack guard gap, between vmas
> 
> and looking into the patch I see the procfs output has been changed
> 
>  | diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>  | index f0c8b33..520802d 100644
>  | --- a/fs/proc/task_mmu.c
>  | +++ b/fs/proc/task_mmu.c
>  | @@ -300,11 +300,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
>  |  
>  |         /* We don't show the stack guard page in /proc/maps */
>  |         start = vma->vm_start;
>  | -       if (stack_guard_page_start(vma, start))
>  | -               start += PAGE_SIZE;
>  |         end = vma->vm_end;
>  | -       if (stack_guard_page_end(vma, end))
>  | -               end -= PAGE_SIZE;
>  |  
>  |         seq_setwidth(m, 25 + sizeof(void *) * 6 - 1);
>  |         seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu ",
> 
> For which we of course are not ready because we've been implying the
> guard page is returned here so we adjust addresses locally when saving
> them into images.
> 
> So now we need to figure out somehow if show_map_vma accounts [PAGE_SIZE|guard_area] or not,
> I guess we might use kernel version here but it won't be working fine on custom kernels,
> or kernels with the patch backported.
> 
> Second I guess we might need to detect @stack_guard_gap runtime as
> well but not yet sure because we only have found this problem and
> hasn't been investigating it deeply yet. Hopefully will do in a
> day or couple (I guess we still have some time before the final
> kernel release).

Sorry for breaking you: we realized there was some risk of that.

Would it be acceptable to you, to judge which kind of a kernel it is,
by whether it has a global variable stack_guard_gap?  I don't know
if that would be a horrible hack, or the kind of thing that you're
used to doing all over the place.  Judging by kernel version will
be awkward, since the patch is being backported to stable kernels.

But I'm surprised by your explanation above: maybe I'm confused,
or maybe the explanation is different.  Because as I see it, the
change I made in that patch *maintained* consistency for CRIU:

It used to be the case that there was a gap page included in the
extent of the stack vma, but it didn't really belong in there,
therefore show_map_vma() massaged the addresses shown to conceal it.

Whereas now with the 1be7107fbe18 commit, the gap (page or more)
is not included in the extent of the stack vma, so there's no
longer any need to massage the addresses shown to conceal it.

We do need to understand this fairly quickly, since those stable
backports will pose more of a problem for you than the v4.12
release itself.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ