[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1d02383-19fd-20c6-b6b7-0f769bc40582@arm.com>
Date: Mon, 17 Jul 2023 18:07:21 +0100
From: James Morse <james.morse@....com>
To: Peter Newman <peternewman@...gle.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
Babu Moger <Babu.Moger@....com>,
shameerali.kolothum.thodi@...wei.com,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, dfustini@...libre.com
Subject: Re: [PATCH v4 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry
when it is interrupted
Hi Peter,
On 6/7/23 13:51, Peter Newman wrote:
> On Tue, Jun 6, 2023 at 7:03 PM James Morse <james.morse@....com> wrote:
>> On 06/06/2023 09:49, Peter Newman wrote:
>>> It looks like if __rmid_read() is interrupted by an occupancy counter
>>> read between writing QM_EVTSEL and reading QM_CTR, it will not perform
>>> any update to am->prev_msr, and the interrupted read will return the
>>> same counter value as in the interrupting read.
>>
>> Yup, that's a problem. I was only looking at the mbm state in memory, not the CPU register.
>> I think the fix is to read back QM_EVTSEL after reading QM_CTR. I'll do this in
>> __rmid_read() to avoid returning -EINTR. It creates two retry loops which is annoying, but
>> making the window larger means you're more likely to see false positives.
>>
>> ----------------------------%<----------------------------
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl
>> /monitor.c
>> index e24390d2e661..aeba035bb680 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -101,6 +101,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned
>> long val)
>>
>> static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>> {
>> + u32 _rmid, _eventid;
>> u64 msr_val;
>>
>> /*
>> @@ -110,9 +111,15 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>> * IA32_QM_CTR.data (bits 61:0) reports the monitored data.
>> * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
>> * are error bits.
>> + * QM_EVTSEL is re-read to detect if this function was interrupted by
>> + * another call, meaning the QM_CTR value may belong to a different
>> + * event.
>> */
>> - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
>> - rdmsrl(MSR_IA32_QM_CTR, msr_val);
>> + do {
>> + wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
>> + rdmsrl(MSR_IA32_QM_CTR, msr_val);
>> + rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
>> + } while (eventid != _eventid || rmid != _rmid);
>>
>> if (msr_val & RMID_VAL_ERROR)
>> return -EIO;
> I happen to be tracking the cost of resctrl_arch_rmid_read() calls, so
> I measured the impact of your fix on my AMD EPYC 7B12:
>
> with both this and the soft RMID series[1] applied:
> The Soft RMID switches contain two __rmid_read() calls, so this
> implies each QM_EVTSEL read-back is around 420 cycles on this AMD
> implementation.
Oooer. I assumed writes might have tedious side-effects but reads would cheap.
I suppose its because another CPU may have modified this value in the meantime.
> Even if you don't agree with my plan to add resctrl_arch_rmid_read()
> calls to context switches, there should be cheaper ways to handle
> this.
Yup, I've swapped this for a sequence counter[0], which should push that cost into the noise.
Anything left will be the cost of the atomics.
Thanks,
James
[0] barely tested:
------------------%<------------------
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 238831d53479..86d3a1b99be6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -16,6 +16,7 @@
*/
#include <linux/module.h>
+#include <linux/percpu.h>
#include <linux/sizes.h>
#include <linux/slab.h>
@@ -24,6 +25,9 @@
#include "internal.h"
+/* Sequence number for writes to IA32 QM_EVTSEL */
+static DEFINE_PER_CPU(u64, qm_evtsel_seq);
+
struct rmid_entry {
/*
* Some architectures's resctrl_arch_rmid_read() needs the CLOSID value
@@ -178,8 +182,7 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
{
- u32 _rmid, _eventid;
- u64 msr_val;
+ u64 msr_val, seq;
/*
* As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -188,15 +191,16 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
* IA32_QM_CTR.data (bits 61:0) reports the monitored data.
* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
* are error bits.
- * QM_EVTSEL is re-read to detect if this function was interrupted by
- * another call, meaning the QM_CTR value may belong to a different
- * event.
+ * A per-cpu sequence counter is incremented each time QM_EVTSEL is
+ * written. This is used to detect if this function was interrupted by
+ * another call without re-reading the MSRs. Retry the MSR read when
+ * this happens as the QM_CTR value may belong to a different event.
*/
do {
+ seq = this_cpu_inc_return(qm_evtsel_seq);
wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
rdmsrl(MSR_IA32_QM_CTR, msr_val);
- rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
- } while (eventid != _eventid || rmid != _rmid);
+ } while (seq != this_cpu_read(qm_evtsel_seq));
if (msr_val & RMID_VAL_ERROR)
return -EIO;
------------------%<------------------
Powered by blists - more mailing lists