lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 18 Oct 2020 23:03:23 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Jeffrin Jose T <jeffrin@...agiritech.edu.in>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        jpoimboe@...hat.com, mbenes@...e.cz,
        "peterz@...radead.org" <peterz@...radead.org>,
        shile.zhang@...ux.alibaba.com, lkml <linux-kernel@...r.kernel.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        Shuah Khan <shuah@...nel.org>
Subject: Re: Fwd: [WARNING AND ERROR]  may be  system slow and  audio and
 video breaking

On Mon, Oct 19, 2020 at 01:51:34AM +0530, Jeffrin Jose T wrote:
> On Sun, 2020-10-18 at 19:49 +0200, Borislav Petkov wrote:
> > On Sun, Oct 18, 2020 at 10:42:39PM +0530, Jeffrin Jose T wrote:
> > > smpboot: Scheduler frequency invariance went wobbly, disabling!
> > > [ 1112.592866] unchecked MSR access error: RDMSR from 0x123 at rIP:
> > > 0xffffffffb5c9a184 (native_read_msr+0x4/0x30)

Ok, you forgot to say in your initial mail that this happens when you
suspend your laptop.

Now, this unchecked MSR error thing happens only once because that early
during resume the microcode on CPU1 is not updated yet - and that needs
to be debugged separately and I'll try to reproduce that on my machine -
so the microcode is not updated yet and therefore the 0x123 MSR is not
"emulated" by the microcode, so to speak, thus the warning.

That warning doesn't happen anymore, though, once the microcode is
updated.

But what happens after that is you get a flood of correctable PCIe
errors about a transaction to a device timeoutting:

pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask=00001000/00002000
pcieport 0000:00:1c.5:    [12] Timeout 

and it looks like that flood is slowing down the machine because it is
busy logging them.

Do

# lspci -nn -xxx

as root. It'll show us which device that 8086:9d15 is.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ