lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 23 Nov 2015 08:32:11 +0100 From: Juergen Gross <jgross@...e.com> To: vasvir@....demokritos.gr Cc: linux-kernel@...r.kernel.org, Toshi Kani <toshi.kani@...com>, "Luis R. Rodriguez" <mcgrof@...e.com> Subject: Re: Hibernate resume bug around 3,18-rc2 - Full PAT support On 21/11/15 12:49, Vassilis Virvilis wrote: > On 11/20/2015 02:23 PM, Juergen Gross wrote: >> On 20/11/15 11:04, vasvir@....demokritos.gr wrote: >>>> I've just found a potential issue: In case MTRR is disabled by the BIOS >>>> the PAT register of the boot processor won't be restored after resume. >>>> >>>> Can you check whether pr_info("MTRR: Disabled\n") has been executed in >>>> early boot? If yes, this might be a BIOS option. >>>> >>> >>> I don't have access right now. I will test it later tonight (This is my >>> home machine). >>> >>> Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr >>> somewere else e.g. /proc /sys etc? >> >> I think grepping for MTRR in dmesg should be enough. > > kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the > familiar (see previously attached image) "Calling lapic..." place. > > $dmesg | grep -i mtr for 4.3 kernel with notpat > [ 0.189113] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] > with a huge-page mapping due to MTRR override. > [ 0.189559] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > [ 8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back > new: write-combining > [ 8.994154] Failed to add WC MTRR for > [00000000e0000000-00000000efffffff]; performance may suffer. > > $dmesg | grep -i mtr for 4.3 kernel with default pat enabled > [ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1 > [ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs > [ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] > with a huge-page mapping due to MTRR override. > [ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1 > [ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 > usecs > > > I also checked my BIOS. I found nothing about mtrr. My BIOS manual is > ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option > about MTRR? As the BIOS obviously isn't disabling MTRR I don't think we have to go that route any longer. > Question: If we assume your theory is correct about mtrr/pat, wouldn't > lockup/hang reboot every time the system goes to hibernate/resume? Can > this assumption explain why the first hibernation/resume cycles in rapid > succession after system boot are working and the long ones fail somewhat > more consistently? Hmm, I'm really not sure. It would depend on the usage of non-standard cache mode mappings. But as MTRR isn't disabled this theory won't apply to your problem. > Note: With PAT enabled the system boots up significantly faster. > > In the weekend I will return to 3.18-rc2 and I will try to verify my > bisection is correct. Double guessing your self is a terrible thing... Thanks. > I will also try with nopat and I will run dmesg | grep -i mtr and post > results > > Unless you have any other suggestions... I think we have to find out where the kernel is really hanging. Do you have any chance to trigger a NMI? Looking into suspend/resume code I found a strange inconsistency for the lapic handling: lapic_suspend() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) apic_pm_state.apic_thmr = apic_read(APIC_LVTTHMR); #endif ... } lapic_resume() { ... #if defined(CONFIG_X86_MCE_INTEL) if (maxlvt >= 5) apic_write(APIC_LVTTHMR, apic_pm_state.apic_thmr); #endif ... } and comparing that to: clear_local_APIC() { ... #ifdef CONFIG_X86_THERMAL_VECTOR if (maxlvt >= 5) { v = apic_read(APIC_LVTTHMR); apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED); } #endif #ifdef CONFIG_X86_MCE_INTEL if (maxlvt >= 6) { v = apic_read(APIC_LVTCMCI); if (!(v & APIC_LVT_MASKED)) apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED); } #endif ... } I think it would be interesting to know your kernel config... Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists