lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZQHiJxHSSGNk+ul5@gmail.com>
Date:   Wed, 13 Sep 2023 09:24:07 -0700
From:   Breno Leitao <leitao@...ian.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Sandipan Das <sandipan.das@....com>, leit@...com,
        dcostantino@...a.com,
        "open list:PERFORMANCE EVENTS SUBSYSTEM" 
        <linux-perf-users@...r.kernel.org>,
        "open list:PERFORMANCE EVENTS SUBSYSTEM" 
        <linux-kernel@...r.kernel.org>, Jirka Hladky <jhladky@...hat.com>
Subject: Re: [PATCH] perf/x86/amd: Do not WARN on every IRQ

Hi Peter,

On Fri, Jun 16, 2023 at 03:29:54PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 16, 2023 at 04:53:15AM -0700, Breno Leitao wrote:
> > On some systems, the Performance Counter Global Status Register is
> > coming with reserved bits set, which causes the system to be unusable
> > if a simple `perf top` runs. The system hits the WARN() thousands times
> > while perf runs.
> > 
> > WARNING: CPU: 18 PID: 20608 at arch/x86/events/amd/core.c:944 amd_pmu_v2_handle_irq+0x1be/0x2b0
> > 
> > This happens because the "Performance Counter Global Status Register"
> > (PerfCntGlobalStatus) MSR has bit 7 set. Bit 7 should be reserved according
> > to the documentation (Figure 13-12 from "AMD64 Architecture Programmer’s
> > Manual, Volume 2: System Programming, 24593"[1]
> 
> Would it then not make more sense to mask out bit7 before:
> 
> +	status &= ~AMD_PMU_V2_GLOBAL_STATUS_RESERVED;
> 	if (!status)
> 		goto done;

Instead of masking `status` against AMD_PMU_V2_GLOBAL_STATUS_RESERVED
(AMD64_NUM_COUNTERS?), I opted for using the `amd_pmu_global_cntr_mask`
global variable because it seems to represent what the loop below is
iterating over:

	/* PMC Enable and Overflow bits for PerfCntrGlobal* registers */
	static u64 amd_pmu_global_cntr_mask __read_mostly;

Also, I think we want to WARN_ON_ONCE() if we see this problem. Right
now, it warns at every time we call this function, which makes the
machine unusable, but, warning it once could be helpful to figure out
there is something wrong with the machine/firmware.

Anyway, please let me know whatever is your preferred way and I will
submit a v2.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ