lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 30 Jun 2016 08:24:02 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	"Rafael J. Wysocki" <rafael@...nel.org>
Cc:	Borislav Petkov <bp@...en8.de>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Logan Gunthorpe <logang@...tatee.com>,
	Kees Cook <keescook@...omium.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Andy Lutomirski <luto@...nel.org>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Linux PM list <linux-pm@...r.kernel.org>,
	Stephen Smalley <sds@...ho.nsa.gov>
Subject: Re: [PATCH v4] x86/power/64: Fix kernel text mapping corruption
 during image restoration

On Thu, Jun 30, 2016 at 8:17 AM, Rafael J. Wysocki <rafael@...nel.org> wrote:
> On Thu, Jun 30, 2016 at 5:05 PM, Borislav Petkov <bp@...en8.de> wrote:
>> On Thu, Jun 30, 2016 at 03:17:20PM +0200, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>>
>>> Logan Gunthorpe reports that hibernation stopped working reliably for
>>> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
>>> and rodata).
>>
>> ...
>>
>>> +static int relocate_restore_code(void)
>>> +{
>>> +     pgd_t *pgd;
>>> +     pud_t *pud;
>>> +
>>> +     relocated_restore_code = get_safe_page(GFP_ATOMIC);
>>> +     if (!relocated_restore_code)
>>> +             return -ENOMEM;
>>> +
>>> +     memcpy((void *)relocated_restore_code, &core_restore_code, PAGE_SIZE);
>>> +
>>> +     /* Make the page containing the relocated code executable */
>>> +     pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
>>> +     pud = pud_offset(pgd, relocated_restore_code);
>>> +     if (pud_large(*pud)) {
>>> +             set_pud(pud, __pud(pud_val(*pud) & ~_PAGE_NX));
>>> +     } else {
>>> +             pmd_t *pmd = pmd_offset(pud, relocated_restore_code);
>>> +
>>> +             if (pmd_large(*pmd)) {
>>> +                     set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
>>> +             } else {
>>> +                     pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code);
>>> +
>>> +                     set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
>>> +             }
>>> +     }
>>> +     flush_tlb_all();
>>
>> I know you want to flush TLBs but this causes the splat below on the
>> resume kernel.
>>
>> Most likely because:
>>
>> resume_target_kernel() does local_irq_disable() and then
>>
>> swsusp_arch_resume() -> relocate_restore_code() -> flush_tlb_all()
>>
>> and smp_call_function_many() doesn't like it when IRQs are disabled.
>
> Right.
>
> Can I invoke __flush_tlb_all() from here to flush the TLB on the local
> CPU?  That should be sufficient IMO.

Yes, unless there's another CPU already up at the time, but that seems unlikely.

FWIW, the CPU can and will cache the NX bit and the CPU will *not*
refresh the TLB before sending a page fault.  You definitely need to
flush.  I wonder if this is also why you're seeing a certain amount of
random corruption.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ