linux-kernel - Re: [criu] 1M guard page ruined restore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Thu, 22 Jun 2017 18:05:00 +0300
From:   Cyrill Gorcunov <gorcunov@...il.com>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Hugh Dickins <hughd@...gle.com>, Andrey Vagin <avagin@...nvz.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Dmitry Safonov <dsafonov@...tuozzo.com>,
        Andrew Morton <akpm@...uxfoundation.org>,
        Adrian Reber <areber@...hat.com>
Subject: Re: [criu] 1M guard page ruined restore

On Thu, Jun 22, 2017 at 04:23:00PM +0200, Oleg Nesterov wrote:
> Cyrill,
> 
> I am replying to my own email because I got lost in numerous threads/emails
> connected to stack guard/gap problems. IIRC you confirmed that the 1st load
> doesn't fail and the patch fixes the problem. So everything is clear, and we
> will discuss this change in another thread.

Yes.

> But let me add that (imo) you should not change this test-case. You simply
> should not run it if kerndat_mm_guard_page_maps() detects the new kernel at
> startup.
> 
> The new version makes no sense for criu, afaics. Yes, yes, thank you very
> much for this test-case, it found the kernel regression ;) But criu has
> nothing to do with this problem, and it is not clear right now if we are
> going to fix it or not.

To be fair the first reporter is Andrew Vagin :) He wrote the test and
poked me to look into. If we're not going to fix it in the kernel then
sure -- we won't run it on new kernels (hell knows though, what else
application may fail, as Linus pointed it's perfectly valid to map and
autogrow the vma).

> With the recent kernel changes criu should never look outside of start-end
> region reported by /proc/maps; and restore doesn't even need to know if a
> GROWSDOWN region will actually grow or not, because (iiuc) you do not need
> to auto-grow the stack vma during restore, criu re-creates the whole vma
> with the same length using MAP_FIXED and it should never write below the
> addr returned by mmap(MAP_FIXED).

Yes, and we already do, thanks.

> So (afaics) the only complication is that the process can be dumped on
> a system running with (say) stack_guard_gap=4K kernel parameter, and then
> restored on another system running with stack_guard_gap=1M. In this case
> the application may fail after restore if it tries to auto-grow the stack,
> but this is unlikely and this is another story.

Yes, it's different problem and it would be cool to be able to fetch this
value somehow (maybe via sysfs or something). Otherwise if such container
migration case happen we simply find error code in the restore log and
I fear it won't be clear that the error happened exactly because of
gap settings variation.