[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCi94amaO4ALGhLPn=zWEJ3STJWyYid4L+kMjXYf9wKqAQ@mail.gmail.com>
Date: Thu, 14 Nov 2024 11:31:25 +0100
From: Peter Newman <peternewman@...gle.com>
To: Tony Luck <tony.luck@...el.com>
Cc: Reinette Chatre <reinette.chatre@...el.com>, Fenghua Yu <fenghua.yu@...el.com>,
Jonathan Corbet <corbet@....net>, Shuah Khan <skhan@...uxfoundation.org>, x86@...nel.org,
James Morse <james.morse@....com>, Jamie Iles <quic_jiles@...cinc.com>,
Babu Moger <babu.moger@....com>, Randy Dunlap <rdunlap@...radead.org>,
"Shaopeng Tan (Fujitsu)" <tan.shaopeng@...itsu.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v8 3/7] x86/resctrl: Refactor mbm_update()
Hi Tony,
On Wed, Nov 13, 2024 at 11:58 PM Tony Luck <tony.luck@...el.com> wrote:
>
> On Wed, Nov 13, 2024 at 02:25:53PM -0800, Reinette Chatre wrote:
> > Hi Tony,
> >
> > On 10/29/24 10:28 AM, Tony Luck wrote:
> > > Computing memory bandwidth for all enabled events resulted in
> > > identical code blocks for total and local bandwidth in mbm_update().
> > >
> > > Refactor with a helper function to eliminate code duplication.
> > >
> > > No functional change.
> > >
> > > Signed-off-by: Tony Luck <tony.luck@...el.com>
> > > ---
> > > arch/x86/kernel/cpu/resctrl/monitor.c | 69 ++++++++++-----------------
> > > 1 file changed, 24 insertions(+), 45 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> > > index 3ef339e405c2..1b6cb3bbc008 100644
> > > --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> > > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> > > @@ -829,62 +829,41 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
> > > resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
> > > }
> > >
> > > -static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
> > > - u32 closid, u32 rmid)
> > > +static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
> > > + u32 closid, u32 rmid, enum resctrl_event_id evtid)
> > > {
> > > struct rmid_read rr = {0};
> > >
> > > rr.r = r;
> > > rr.d = d;
> > > + rr.evtid = evtid;
> > > + rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
> > > + if (IS_ERR(rr.arch_mon_ctx)) {
> > > + pr_warn_ratelimited("Failed to allocate monitor context: %ld",
> > > + PTR_ERR(rr.arch_mon_ctx));
> > > + return;
> > > + }
> > > +
> > > + __mon_event_count(closid, rmid, &rr);
> > > +
> > > + if (is_mba_sc(NULL))
> > > + mbm_bw_count(closid, rmid, &rr);
> > > +
> >
> > As I am staring at this more there seems to be an existing issue here ... note how
> > __mon_event_count()'s return value is not checked before mbm_bw_count() is called.
> > This means that mbm_bw_count() may run with rr.val of 0 that results in wraparound
> > inside it resulting in some unexpected bandwidth numbers. Since a counter read can fail
> > with a "Unavailable"/"Error" from hardware it is not deterministic how frequently this
> > issue can be encountered.
> >
> > Skipping mbm_bw_count() if rr.val is 0 is one option ... that would keep the bandwidth
> > measurement static at whatever was the last successful read and thus not cause dramatic
> > changes by the software controller ... setting bandwidth to 0 if rr.val is 0 is another
> > option to reflect that bandwidth data is unavailable, but then the software controller should
> > perhaps get signal to not make adjustments? I expect there are better options? What do
> > you think?
>
> Skipping mbm_bw_count() is also undesirable. If some later
> __mon_event_count() does succeed the bandwidth will be computed
> based on the last and current values as if they were one second
> apart, when actually some longer interval elapsed.
>
> I don't think this is a big issue for current Intel CPU RDT
> implementations because I don't think they will return the
> bit 62 unavailable value in the IA32_QM_CTR MSR. I'll ask
> around to check.
>
> But it does mean that implementing the "summary bandwidth"
> file discussed in the other e-mail thread[1] may be more
> complex on systems that can return that a counter is
> unavailable. We'd have to keep track that two succesful
> counter reads occured, with a measure of the interval
> between them before reporting a value in the summary file.
At least for the purposes of reporting the mbps rate to userspace, my
users will record from the files one per second, so it would be fine
to just report Unavailable (or Unassigned when it's clear that the
error is because counter wasn't unassigned) whenever either the
current or previous reading was not successful. Then they can assume
the value or error reported always refers to the most
recently-completed one-second window.
-Peter
Powered by blists - more mailing lists