lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210317084829.GA474581@gmail.com>
Date:   Wed, 17 Mar 2021 09:48:29 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Kim Phillips <kim.phillips@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Jiri Olsa <jolsa@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Tom Lendacky <thomas.lendacky@....com>, x86@...nel.org,
        lkml <linux-kernel@...r.kernel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Stanislav Kozina <skozina@...hat.com>,
        Michael Petlan <mpetlan@...hat.com>,
        Pierre Amadio <pamadio@...hat.com>, onatalen@...hat.com,
        darcari@...hat.com
Subject: Re: unknown NMI on AMD Rome


* Kim Phillips <kim.phillips@....com> wrote:

> On 3/16/21 2:53 PM, Peter Zijlstra wrote:
> > On Tue, Mar 16, 2021 at 04:45:02PM +0100, Jiri Olsa wrote:
> >> hi,
> >> when running 'perf top' on AMD Rome (/proc/cpuinfo below)
> >> with fedora 33 kernel 5.10.22-200.fc33.x86_64
> >>
> >> we got unknown NMI messages:
> >>
> >> [  226.700160] Uhhuh. NMI received for unknown reason 3d on CPU 90.
> >> [  226.700162] Do you have a strange power saving mode enabled?
> >> [  226.700163] Dazed and confused, but trying to continue
> >> [  226.769565] Uhhuh. NMI received for unknown reason 3d on CPU 84.
> >> [  226.769566] Do you have a strange power saving mode enabled?
> >> [  226.769567] Dazed and confused, but trying to continue
> >> [  226.769771] Uhhuh. NMI received for unknown reason 2d on CPU 24.
> >> [  226.769773] Do you have a strange power saving mode enabled?
> >> [  226.769774] Dazed and confused, but trying to continue
> >> [  226.812844] Uhhuh. NMI received for unknown reason 2d on CPU 23.
> >> [  226.812846] Do you have a strange power saving mode enabled?
> >> [  226.812847] Dazed and confused, but trying to continue
> >> [  226.893783] Uhhuh. NMI received for unknown reason 2d on CPU 27.
> >> [  226.893785] Do you have a strange power saving mode enabled?
> >> [  226.893786] Dazed and confused, but trying to continue
> >> [  226.900139] Uhhuh. NMI received for unknown reason 2d on CPU 40.
> >> [  226.900141] Do you have a strange power saving mode enabled?
> >> [  226.900143] Dazed and confused, but trying to continue
> >> [  226.908763] Uhhuh. NMI received for unknown reason 3d on CPU 120.
> >> [  226.908765] Do you have a strange power saving mode enabled?
> >> [  226.908766] Dazed and confused, but trying to continue
> >> [  227.751296] Uhhuh. NMI received for unknown reason 2d on CPU 83.
> >> [  227.751298] Do you have a strange power saving mode enabled?
> >> [  227.751299] Dazed and confused, but trying to continue
> >> [  227.752937] Uhhuh. NMI received for unknown reason 3d on CPU 23.
> >>
> >> also when discussing ths with Borislav, he managed to reproduce easily
> >> on his AMD Rome machine
> >>
> >> any idea?
> > 
> > Kim is the AMD point person for this I think..
> 
> Since perf top invokes precision and therefore IBS,
> this looks like it's hitting erratum #1215:
> 
> https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf

So:


  1215 IBS (Instruction Based Sampling) Counter Valid Value
  May be Incorrect After Exit From Core C6 (CC6) State

  Description

  If a core's IBS feature is enabled and configured to generate an interrupt, including NMI (Non-Maskable
  Interrupt), and the IBS counter overflows during the entry into the Core C6 (CC6) state, the interrupt may be
  issued, but an invalid value of the valid bit may be restored when the core exits CC6.
  Potential Effect on System

  The operating system may receive interrupts due to an IBS counter event, including NMI, and not observe an
  valid IBS register. Console messages indicating "NMI received for unknown reason" have been observed on
  Linux systems.

  Suggested Workaround: None
  Fix Planned: No fix planned

lovely.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ