[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCgknY0pWkXaCayPx28otcA5=v5a5FFoLFx3At0cGvAykg@mail.gmail.com>
Date: Tue, 16 May 2023 16:18:40 +0200
From: Peter Newman <peternewman@...gle.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>, Babu Moger <babu.moger@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>,
Stephane Eranian <eranian@...gle.com>,
James Morse <james.morse@....com>,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to
collect CPUs' MBM events
Hi Reinette,
On Fri, May 12, 2023 at 3:25 PM Peter Newman <peternewman@...gle.com> wrote:
> On Thu, May 11, 2023 at 11:37 PM Reinette Chatre
> <reinette.chatre@...el.com> wrote:
> > On 4/21/2023 7:17 AM, Peter Newman wrote:
> > > +
> > > + if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
> > > + counter = &state->local;
> > > + } else {
> > > + WARN_ON(evtid != QOS_L3_MBM_TOTAL_EVENT_ID);
> > > + counter = &state->total;
> > > + }
> > > +
> > > + /*
> > > + * Propagate the value read from the hw_rmid assigned to the current CPU
> > > + * into the "soft" rmid associated with the current task or CPU.
> > > + */
> > > + m = get_mbm_state(d, soft_rmid, evtid);
> > > + if (!m)
> > > + return;
> > > +
> > > + if (resctrl_arch_rmid_read(r, d, hw_rmid, evtid, &val))
> > > + return;
> > > +
> >
> > This all seems unsafe to run without protection. The code relies on
> > the rdt_domain but a CPU hotplug event could result in the domain
> > disappearing underneath this code. The accesses to the data structures
> > also appear unsafe to me. Note that resctrl_arch_rmid_read() updates
> > the architectural MBM state and this same state can be updated concurrently
> > in other code paths without appropriate locking.
>
> The domain is supposed to always be the current one, but I see that
> even a get_domain_from_cpu(smp_processor_id(), ...) call needs to walk
> a resource's domain list to find a matching entry, which could be
> concurrently modified when other domains are added/removed.
>
> Similarly, when soft RMIDs are enabled, it should not be possible to
> call resctrl_arch_rmid_read() outside of on the current CPU's HW RMID.
>
> I'll need to confirm whether it's safe to access the current CPU's
> rdt_domain in an atomic context. If it isn't, I assume I would have to
> arrange all of the state used during flush to be per-CPU.
>
> I expect the constraints on what data can be safely accessed where is
> going to constrain how the state is ultimately arranged, so I will
> need to settle this before I can come back to the other questions
> about mbm_state.
According to cpu_hotplug.rst, the startup callbacks are called before
a CPU is started and the teardown callbacks are called after the CPU
has become dysfunctional, so it should always be safe for a CPU to
access its own data, so all I need to do here is avoid walking domain
lists in resctrl_mbm_flush_cpu().
However, this also means that resctrl_{on,off}line_cpu() call
clear_closid_rmid() on a different CPU, so whichever CPU executes
these will zap its own pqr_state struct and PQR_ASSOC MSR.
-Peter
Powered by blists - more mailing lists