linux-kernel - RE: [PATCH v2 2/2] x86/resctrl: Don't workqueue local event counter reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB6083BA9392D4B176FA2DA170FC5C2@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Thu, 7 Nov 2024 22:14:58 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Chatre, Reinette" <reinette.chatre@...el.com>, Peter Newman
	<peternewman@...gle.com>
CC: "Yu, Fenghua" <fenghua.yu@...el.com>, "babu.moger@....com"
	<babu.moger@....com>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "Eranian,
 Stephane" <eranian@...gle.com>, "hpa@...or.com" <hpa@...or.com>,
	"james.morse@....com" <james.morse@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>,
	"nert.pinx@...il.com" <nert.pinx@...il.com>, "tan.shaopeng@...itsu.com"
	<tan.shaopeng@...itsu.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH v2 2/2] x86/resctrl: Don't workqueue local event counter
 reads

> I think maybe the issue you are trying to address is a user assigning a counter
> and then reading the cached data and getting cached data from a previous
> configuration? Please note that in the current implementation the cached
> data is reset directly on counter assignment [1]. If a user assigns a new
> counter and then immediately read cached data then the cached data will
> reflect the assignment even if the overflow worker thread did not get a chance
> to run since the assignment.

The issue is that AMD's ABMC implementation resets counts when reassigning
h/w counters to events in resctrl groups.  If the processes reading counters is
not fully aware of h/w counter reassignment, insanity will occur.

E.g. read a counter:

$ cat mbm_local_bytes
123456789

H/w counter for this event/group assigned elsewhere.

H/w counter assigned back to this event/group

$ cat mbm_local_bytes
23456

Bandwidth calculation sees traffic amount:
	 (23456 - 123456789) = -123433333
Oops. Negative!

-Tony