[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjAUOO=2wECEQF+weKsto5LNi1-8nVi_QTLs7B+fvRb5A@mail.gmail.com>
Date: Fri, 16 Dec 2022 14:54:46 +0100
From: Peter Newman <peternewman@...gle.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
James Morse <james.morse@....com>,
Shaopeng Tan <tan.shaopeng@...itsu.com>,
Jamie Iles <quic_jiles@...cinc.com>,
linux-kernel@...r.kernel.org, eranian@...gle.com,
Babu Moger <Babu.Moger@....com>
Subject: Re: [PATCH] x86/resctrl: Fix event counts regression in reused RMIDs
Hi Reinette,
On Wed, Dec 14, 2022 at 8:17 PM Reinette Chatre
<reinette.chatre@...el.com> wrote:
> On 12/14/2022 6:21 AM, Peter Newman wrote:
> > mbm_state is arch-independent, so I think putting it here would require
> > the MPAM version to copy this and for get_mbm_state() to be exported.
>
> You are correct, it is arch independent ... so every arch is expected to
> have it.
> I peeked at your series and that looks good also - having cleanup done in
> a central place helps to avoid future mistakes.
>
> >> am = get_arch_mbm_state(hw_dom, rmid, eventid);
> >> if (am) {
> >> memset(am, 0, sizeof(*am));
> >> /* Record any initial, non-zero count value. */
> >> ret = __rmid_read(rmid, eventid, &val);
> >> if (!ret)
> >> am->prev_msr = val;
> >> }
> >>
> >> }
> >>
> >> Having this would be helpful as reference to Babu's usage.
> >
> > His usage looks a little different.
> >
> > According to the comment in Babu's patch:
> >
> > https://lore.kernel.org/lkml/166990903030.17806.5106229901730558377.stgit@bmoger-ubuntu/
> >
> > + /*
> > + * When an Event Configuration is changed, the bandwidth counters
> > + * for all RMIDs and Events will be cleared by the hardware. The
> > + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> > + * every RMID on the next read to any event for every RMID.
> > + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> > + * cleared while it is tracked by the hardware. Clear the
> > + * mbm_local and mbm_total counts for all the RMIDs.
> > + */
> > + resctrl_arch_reset_rmid_all(r, d);
> >
> > If all the hardware counters are zeroed as the comment suggests, then
> > leaving am->prev_msr zero seems correct. __rmid_read() would likely
> > return an error anyways. The bug I was addressing was one of reusing
> > an RMID which had not been reset.
>
> You are correct, but there are two things to keep in mind though:
> * the change from which you copied the above snippet introduces a new
> _generic_ utility far away from this call site. It is thus reasonable to
> assume that this utility should work for all use cases, not just the one
> for which it is created. Since there are no other use cases at this time,
> this may be ok, but I think at minimum the utility will benefit from
> a snippet indicating the caveats of its use as a heads up to any future users.
> * the utility does not clear struct mbm_state contents. Again, this is ok
> for this usage since AMD does not support the software controller but
> as far as a generic utility goes the usage should be clear to avoid
> traps for future changes.
To this end, would it help if I pulled the rr->first case into a
separate function like this:
- resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
- m = get_mbm_state(rr->d, rmid, rr->evtid);
- if (m)
- memset(m, 0, sizeof(struct mbm_state));
+ resctrl_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
I'm open to suggestions on the name.
-Peter
Powered by blists - more mailing lists