lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2056950f-02a8-41cc-9dd0-c16b81afd8e3@intel.com>
Date: Mon, 13 Oct 2025 11:16:41 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "Moger, Babu" <bmoger@....com>, "Babu
 Moger" <babu.moger@....com>, "Dave.Martin@....com" <Dave.Martin@....com>,
	"james.morse@....com" <james.morse@....com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de"
	<bp@...en8.de>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
CC: "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"peternewman@...gle.com" <peternewman@...gle.com>, "Eranian, Stephane"
	<eranian@...gle.com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH v4] x86/resctrl: Fix miscount of bandwidth event when
 reactivating previously Unavailable RMID

Hi Tony,

On 10/13/25 8:35 AM, Luck, Tony wrote:
>>> The behavior of the counter is different on Intel where there are enough
>>> counters backing the RMID and the "Unavailable" bit is not set when counter
>>> starts counting but instead the counter returns "0". For example, when
> 
> Note that the h/w counter doesn't really return "0" (except for the first time
> after CPU reset).

Correct. 
In this example both the hardware counter and the event returned zero. The
main point was that it does not return "Unavailable".

The goal with the example related to this issue was to demonstrate no impact on Intel
when resetting arch_mbm_state::prev_msr on receipt of "Unavailable". Do you see things
differently?

> 
>>> running equivalent of "step 1" on an Intel system it looks like:
>>>
>>>     # cd /sys/fs/resctrl
>>>     # mkdir mon_groups/test1
> 
> While making the directory mon_add_all_files() does this:
> 
>                 if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
>                         mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
> 
> Which in __mon_event_count() does:
> 
>         if (rr->first) {
>                 if (rr->is_mbm_cntr)
>                         resctrl_arch_reset_cntr(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
>                 else
>                         resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
>                 m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
>                 if (m)
>                         memset(m, 0, sizeof(struct mbm_state));
>                 return 0;
>         }
> 
> If you dig into resctrl_arch_reset_rmid() you will see that it reads the h/w counter and
> then that becomes the start point for subsequent values reported when a user reads
> from the resctrl event file.

I believe resctrl_arch_reset_rmid() already addresses the issue since resctrl_arch_reset_rmid()
always resets the architectural state before attempting to read the RMID. If __rmid_read_phys()
encounters "Unavailable"/-EINVAL then it is fine since arch_mbm_state::prev_msr will already be
zero and thus ready for a subsequent resctrl_arch_rmid_read(), whether hardware counter is ready
or not.

Reinette


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ