linux-kernel - Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALPaoCgknY0pWkXaCayPx28otcA5=v5a5FFoLFx3At0cGvAykg@mail.gmail.com>
Date:   Tue, 16 May 2023 16:18:40 +0200
From:   Peter Newman <peternewman@...gle.com>
To:     Reinette Chatre <reinette.chatre@...el.com>
Cc:     Fenghua Yu <fenghua.yu@...el.com>, Babu Moger <babu.moger@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Stephane Eranian <eranian@...gle.com>,
        James Morse <james.morse@....com>,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to
 collect CPUs' MBM events

Hi Reinette,

On Fri, May 12, 2023 at 3:25 PM Peter Newman <peternewman@...gle.com> wrote:
> On Thu, May 11, 2023 at 11:37 PM Reinette Chatre
> <reinette.chatre@...el.com> wrote:
> > On 4/21/2023 7:17 AM, Peter Newman wrote:
> > > +
> > > +     if (evtid == QOS_L3_MBM_LOCAL_EVENT_ID) {
> > > +             counter = &state->local;
> > > +     } else {
> > > +             WARN_ON(evtid != QOS_L3_MBM_TOTAL_EVENT_ID);
> > > +             counter = &state->total;
> > > +     }
> > > +
> > > +     /*
> > > +      * Propagate the value read from the hw_rmid assigned to the current CPU
> > > +      * into the "soft" rmid associated with the current task or CPU.
> > > +      */
> > > +     m = get_mbm_state(d, soft_rmid, evtid);
> > > +     if (!m)
> > > +             return;
> > > +
> > > +     if (resctrl_arch_rmid_read(r, d, hw_rmid, evtid, &val))
> > > +             return;
> > > +
> >
> > This all seems unsafe to run without protection. The code relies on
> > the rdt_domain but a CPU hotplug event could result in the domain
> > disappearing underneath this code. The accesses to the data structures
> > also appear unsafe to me. Note that resctrl_arch_rmid_read() updates
> > the architectural MBM state and this same state can be updated concurrently
> > in other code paths without appropriate locking.
>
> The domain is supposed to always be the current one, but I see that
> even a get_domain_from_cpu(smp_processor_id(), ...) call needs to walk
> a resource's domain list to find a matching entry, which could be
> concurrently modified when other domains are added/removed.
>
> Similarly, when soft RMIDs are enabled, it should not be possible to
> call resctrl_arch_rmid_read() outside of on the current CPU's HW RMID.
>
> I'll need to confirm whether it's safe to access the current CPU's
> rdt_domain in an atomic context. If it isn't, I assume I would have to
> arrange all of the state used during flush to be per-CPU.
>
> I expect the constraints on what data can be safely accessed where is
> going to constrain how the state is ultimately arranged, so I will
> need to settle this before I can come back to the other questions
> about mbm_state.

According to cpu_hotplug.rst, the startup callbacks are called before
a CPU is started and the teardown callbacks are called after the CPU
has become dysfunctional, so it should always be safe for a CPU to
access its own data, so all I need to do here is avoid walking domain
lists in resctrl_mbm_flush_cpu().

However, this also means that resctrl_{on,off}line_cpu() call
clear_closid_rmid() on a different CPU, so whichever CPU executes
these will zap its own pqr_state struct and PQR_ASSOC MSR.

-Peter