[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2056950f-02a8-41cc-9dd0-c16b81afd8e3@intel.com>
Date: Mon, 13 Oct 2025 11:16:41 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, "Moger, Babu" <bmoger@....com>, "Babu
Moger" <babu.moger@....com>, "Dave.Martin@....com" <Dave.Martin@....com>,
"james.morse@....com" <james.morse@....com>, "tglx@...utronix.de"
<tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de"
<bp@...en8.de>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>
CC: "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"peternewman@...gle.com" <peternewman@...gle.com>, "Eranian, Stephane"
<eranian@...gle.com>, "gautham.shenoy@....com" <gautham.shenoy@....com>
Subject: Re: [PATCH v4] x86/resctrl: Fix miscount of bandwidth event when
reactivating previously Unavailable RMID
Hi Tony,
On 10/13/25 8:35 AM, Luck, Tony wrote:
>>> The behavior of the counter is different on Intel where there are enough
>>> counters backing the RMID and the "Unavailable" bit is not set when counter
>>> starts counting but instead the counter returns "0". For example, when
>
> Note that the h/w counter doesn't really return "0" (except for the first time
> after CPU reset).
Correct.
In this example both the hardware counter and the event returned zero. The
main point was that it does not return "Unavailable".
The goal with the example related to this issue was to demonstrate no impact on Intel
when resetting arch_mbm_state::prev_msr on receipt of "Unavailable". Do you see things
differently?
>
>>> running equivalent of "step 1" on an Intel system it looks like:
>>>
>>> # cd /sys/fs/resctrl
>>> # mkdir mon_groups/test1
>
> While making the directory mon_add_all_files() does this:
>
> if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
> mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
>
> Which in __mon_event_count() does:
>
> if (rr->first) {
> if (rr->is_mbm_cntr)
> resctrl_arch_reset_cntr(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
> else
> resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
> m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> if (m)
> memset(m, 0, sizeof(struct mbm_state));
> return 0;
> }
>
> If you dig into resctrl_arch_reset_rmid() you will see that it reads the h/w counter and
> then that becomes the start point for subsequent values reported when a user reads
> from the resctrl event file.
I believe resctrl_arch_reset_rmid() already addresses the issue since resctrl_arch_reset_rmid()
always resets the architectural state before attempting to read the RMID. If __rmid_read_phys()
encounters "Unavailable"/-EINVAL then it is fine since arch_mbm_state::prev_msr will already be
zero and thus ready for a subsequent resctrl_arch_rmid_read(), whether hardware counter is ready
or not.
Reinette
Powered by blists - more mailing lists