linux-kernel - Re: [PATCH v4] x86/resctrl: Fix miscount of bandwidth event when reactivating previously Unavailable RMID

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2056950f-02a8-41cc-9dd0-c16b81afd8e3@intel.com>
Date: Mon, 13 Oct 2025 11:16:41 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "Moger, Babu" <bmoger@....com>, "Babu
 Moger" <babu.moger@....com>, "Dave.Martin@....com" <Dave.Martin@....com>,
	"james.morse@....com" <james.morse@....com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de"
	<bp@...en8.de>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
CC: "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"peternewman@...gle.com" <peternewman@...gle.com>, "Eranian, Stephane"
	<eranian@...gle.com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH v4] x86/resctrl: Fix miscount of bandwidth event when
 reactivating previously Unavailable RMID

Hi Tony,

On 10/13/25 8:35 AM, Luck, Tony wrote:
>>> The behavior of the counter is different on Intel where there are enough
>>> counters backing the RMID and the "Unavailable" bit is not set when counter
>>> starts counting but instead the counter returns "0". For example, when
> 
> Note that the h/w counter doesn't really return "0" (except for the first time
> after CPU reset).

Correct. 
In this example both the hardware counter and the event returned zero. The
main point was that it does not return "Unavailable".

The goal with the example related to this issue was to demonstrate no impact on Intel
when resetting arch_mbm_state::prev_msr on receipt of "Unavailable". Do you see things
differently?

> 
>>> running equivalent of "step 1" on an Intel system it looks like:
>>>
>>>     # cd /sys/fs/resctrl
>>>     # mkdir mon_groups/test1
> 
> While making the directory mon_add_all_files() does this:
> 
>                 if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
>                         mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
> 
> Which in __mon_event_count() does:
> 
>         if (rr->first) {
>                 if (rr->is_mbm_cntr)
>                         resctrl_arch_reset_cntr(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
>                 else
>                         resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
>                 m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
>                 if (m)
>                         memset(m, 0, sizeof(struct mbm_state));
>                 return 0;
>         }
> 
> If you dig into resctrl_arch_reset_rmid() you will see that it reads the h/w counter and
> then that becomes the start point for subsequent values reported when a user reads
> from the resctrl event file.

I believe resctrl_arch_reset_rmid() already addresses the issue since resctrl_arch_reset_rmid()
always resets the architectural state before attempting to read the RMID. If __rmid_read_phys()
encounters "Unavailable"/-EINVAL then it is fine since arch_mbm_state::prev_msr will already be
zero and thus ready for a subsequent resctrl_arch_rmid_read(), whether hardware counter is ready
or not.

Reinette