[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j+w4tBiFL16Dz_T1Jokj3pYubt2z0DP-eB2sFA5W3-rg@mail.gmail.com>
Date: Thu, 30 Jun 2016 13:27:16 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Logan Gunthorpe <logang@...tatee.com>,
Kees Cook <keescook@...omium.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
lkml <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Linux PM list <linux-pm@...r.kernel.org>,
Stephen Smalley <sds@...ho.nsa.gov>
Subject: Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption
during image restoration
On Thu, Jun 30, 2016 at 11:45 AM, Borislav Petkov <bp@...en8.de> wrote:
> On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote:
>> That's not what Boris was seeing at least.
>
> Well, I had it a couple of times during testing patches. This is all
> from the logs:
>
> [ 65.121109] PM: Basic memory bitmaps freed
> [ 65.125991] Restarting tasks ...
> [ 65.129342] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
> [ 65.129585] done.
> [ 65.141314] BUG: unable to handle kernel paging request at ffff88042b957e40
I mean the failure mode, not the particular exception type. :-)
You always saw it in a user space task after kernel resume:
> [ 65.141340] Call Trace:
> [ 65.141344] [<ffffffff81181e1e>] ? getname_flags+0x5e/0x1b0
> [ 65.141346] [<ffffffff811782bf>] ? cp_new_stat+0x10f/0x120
> [ 65.141348] [<ffffffff810bb33a>] ? ktime_get_ts64+0x4a/0xf0
> [ 65.141353] [<ffffffff81185fc7>] ? poll_select_copy_remaining+0xe7/0x130
> [ 65.141355] [<ffffffff8100263a>] exit_to_usermode_loop+0x8a/0xb0
> [ 65.141356] [<ffffffff81002a6b>] syscall_return_slowpath+0x5b/0x70
> [ 65.141358] [<ffffffff81688e72>] entry_SYSCALL_64_fastpath+0xa5/0xa7
[cut]
> [ 381.850792] Call Trace:
> [ 381.850795] [<ffffffff8117f8ae>] ? getname_flags+0x5e/0x1b0
> [ 381.850797] [<ffffffff81175d5f>] ? cp_new_stat+0x10f/0x120
> [ 381.850799] [<ffffffff810b9eca>] ? ktime_get_ts64+0x4a/0xf0
> [ 381.850800] [<ffffffff81183a57>] ? poll_select_copy_remaining+0xe7/0x130
> [ 381.850802] [<ffffffff8100263a>] exit_to_usermode_loop+0x8a/0xb0
> [ 381.850804] [<ffffffff81002a6b>] syscall_return_slowpath+0x5b/0x70
> [ 381.850806] [<ffffffff81688272>] entry_SYSCALL_64_fastpath+0xa5/0xa7
[cut]
> [ 49.022675] Call Trace:
> [ 49.022680] [<ffffffff8117f8ae>] ? getname_flags+0x5e/0x1b0
> [ 49.022683] [<ffffffff81175d5f>] ? cp_new_stat+0x10f/0x120
> [ 49.022686] [<ffffffff810b9eca>] ? ktime_get_ts64+0x4a/0xf0
> [ 49.022689] [<ffffffff81183a57>] ? poll_select_copy_remaining+0xe7/0x130
> [ 49.022692] [<ffffffff8100263a>] exit_to_usermode_loop+0x8a/0xb0
> [ 49.022695] [<ffffffff81002a6b>] syscall_return_slowpath+0x5b/0x70
> [ 49.022698] [<ffffffff81688272>] entry_SYSCALL_64_fastpath+0xa5/0xa7
[cut]
> [ 39.636905] Call Trace:
> [ 39.636908] [<ffffffff8117f8be>] ? getname_flags+0x5e/0x1b0
> [ 39.636910] [<ffffffff81175d6f>] ? cp_new_stat+0x10f/0x120
> [ 39.636912] [<ffffffff810b9eaa>] ? ktime_get_ts64+0x4a/0xf0
> [ 39.636917] [<ffffffff81183a67>] ? poll_select_copy_remaining+0xe7/0x130
> [ 39.636919] [<ffffffff8100263a>] exit_to_usermode_loop+0x8a/0xb0
> [ 39.636921] [<ffffffff81002a6b>] syscall_return_slowpath+0x5b/0x70
> [ 39.636922] [<ffffffff81688272>] entry_SYSCALL_64_fastpath+0xa5/0xa7
which is a clear indication of image corruption during restore.
In the Logan's case this happens in swsusp_arch_resume() proper and
the address in RIP is relative to the identity mapping, so the only
place it can happen is the jump to relocated_restore_code. That's
because before that jump the addresses in RIP are relative to the
kernel text mapping and after it we immediately switch over to the
temporary page tables which are all executable. So that is the only
place AFAICS.
Also in your case the failure was 100% reproducible, while in the
Logan's case it has happened once so far (so generally it happens once
in a blue moon).
In summary, I'm sure that this is a different issue.
Thanks,
Rafael
Powered by blists - more mailing lists