lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171012110311.7797e56c@alans-desktop>
Date:   Thu, 12 Oct 2017 11:03:11 +0100
From:   Alan Cox <gnomes@...rguk.ukuu.org.uk>
To:     Gabriel Krisman Bertazi <krisman@...labora.co.uk>
Cc:     tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
        x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86: handle MSR exception when setting energy perf bias

On Thu, 12 Oct 2017 01:30:07 -0300
Gabriel Krisman Bertazi <krisman@...labora.co.uk> wrote:

> On very rare occasions, immediately after a suspend, one of our
> SandyBridge CI boxes hits the exception below on CPU0 while trying to
> reconfigure the energy bias register.  As far as I can tell, this is not
> likely a race in the kernel, since we have only one cpu online, no
> preempt and irqs_disabled, and it can only be reproduced in this
> specific SNB-2600 on rare occasions.  It looks more of a faulty hardware
> thing to me.
> 
> Still, we can handle this exception more gracefully to silence the CI,
> by using the safe version of the msrl_read/write wrapper.

Which means we would silently fail to discover any real problems (like
this one) on systems with a bug. Your system appears to have a problem -
whether it's firmware or Linux I don't know but we should not be covering
it up silently and hoping ignoring it makes it go away - especially when
it will hide other bugs in the future.

At the point it occurs dump bit 3 of ECX from CPUID leaf 6 on that
logical cpu. If that bit is set you should have IA32_ENERGY_PERF_BIAS, if
it's clear you don't. If it's clear here and set somewhere else (eg at
boot) then you've got some hints as to what is maybe going on. If you get
the mismatch see if it is different per core or something insane like
that.

Alan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ