lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Nov 2015 19:56:12 +0100
From:	"Luis R. Rodriguez" <mcgrof@...e.com>
To:	Vassilis Virvilis <vasvir@....demokritos.gr>
Cc:	Juergen Gross <jgross@...e.com>, linux-kernel@...r.kernel.org,
	Toshi Kani <toshi.kani@...com>
Subject: Re: Hibernate resume bug around 3,18-rc2 - Full PAT support

On Sat, Nov 21, 2015 at 01:49:06PM +0200, Vassilis Virvilis wrote:
> On 11/20/2015 02:23 PM, Juergen Gross wrote:
> >On 20/11/15 11:04, vasvir@....demokritos.gr wrote:
> >>>I've just found a potential issue: In case MTRR is disabled by the BIOS
> >>>the PAT register of the boot processor won't be restored after resume.
> >>>
> >>>Can you check whether pr_info("MTRR: Disabled\n") has been executed in
> >>>early boot? If yes, this might be a BIOS option.
> >>>
> >>
> >>I don't have access right now. I will test it later tonight (This is my
> >>home machine).
> >>
> >>Would $dmesg | grep -i mtrr suffice or I need to look for the mtrr
> >>somewere else e.g. /proc /sys etc?
> >
> >I think grepping for MTRR in dmesg should be enough.
> 
> kernel 4.3 +nopat also died on the 4th or the 5th hibernate on the familiar (see previously attached image) "Calling lapic..." place.
> 
> $dmesg | grep -i mtr for 4.3 kernel with notpat
> [    0.189113] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189116] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189222] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [    0.189559] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189560] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> [    8.994140] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [    8.994154] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.

Its not clear from the log who called this MTRR call for WC that failed, I 
hope we didn't attempt a WC wright on a WB region. Who owns
00000000e0000000-00000000efffffff ?

What does your log show right before and after this? To find out try:

dmesg | grep -5 -i mtrr  

Not being able to use WC is not fatal, its just a performance issue, but if we tried
to override a region which we should not have to WC for which another area the BIOS
might rely on to not be WC, that could be a big issue.

> $dmesg | grep -i mtr for 4.3 kernel with default pat enabled
> [    0.189368] calling  mtrr_if_init+0x0/0x5f @ 1
> [    0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [    0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [    0.189814] calling  mtrr_init_finialize+0x0/0x3a @ 1
> [    0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs

The fact we don't see a conflict doesn't mean an issue or conflict didn't
trigger. If PAT didn't see something the BIOS did that make the kernel assume
it could do something that it was not able to. The MTRR init code should pick
up on this stuff and let the kernel PAT code know if there could be a conflict,
but if for some reason that was missed, that could be an issue.

> I also checked my BIOS. I found nothing about mtrr. My BIOS manual is ftp://europe.asrock.com/Manual/H97%20Pro4.pdf. Can you see any option about MTRR?
> 
> Question: If we assume your theory is correct about mtrr/pat, wouldn't lockup/hang reboot every time the system goes to hibernate/resume? Can this assumption explain why the first hibernation/resume cycles in rapid succession after system boot are working and the long ones fail somewhat more consistently?
> 
> Note: With PAT enabled the system boots up significantly faster.
> 
> In the weekend I will return to 3.18-rc2 and I will try to verify my bisection is correct. Double guessing your self is a terrible thing...
> 
> I will also try with nopat and I will run dmesg | grep -i mtr and post results
> 
> Unless you have any other suggestions...

Bisection on the merge commit would help.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ