linux-kernel - Re: [PATCH v4 12/24] x86/resctrl: Make resctrl_arch_rmid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1d02383-19fd-20c6-b6b7-0f769bc40582@arm.com>
Date:   Mon, 17 Jul 2023 18:07:21 +0100
From:   James Morse <james.morse@....com>
To:     Peter Newman <peternewman@...gle.com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org,
        Fenghua Yu <fenghua.yu@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, dfustini@...libre.com
Subject: Re: [PATCH v4 12/24] x86/resctrl: Make resctrl_arch_rmid_read() retry
 when it is interrupted

Hi Peter,

On 6/7/23 13:51, Peter Newman wrote:
> On Tue, Jun 6, 2023 at 7:03 PM James Morse <james.morse@....com> wrote:
>> On 06/06/2023 09:49, Peter Newman wrote:
>>> It looks like if __rmid_read() is interrupted by an occupancy counter
>>> read between writing QM_EVTSEL and reading QM_CTR, it will not perform
>>> any update to am->prev_msr, and the interrupted read will return the
>>> same counter value as in the interrupting read.
>>
>> Yup, that's a problem. I was only looking at the mbm state in memory, not the CPU register.
>> I think the fix is to read back QM_EVTSEL after reading QM_CTR. I'll do this in
>> __rmid_read() to avoid returning -EINTR. It creates two retry loops which is annoying, but
>> making the window larger means you're more likely to see false positives.
>>
>> ----------------------------%<----------------------------
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl
>> /monitor.c
>> index e24390d2e661..aeba035bb680 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -101,6 +101,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned
>>   long val)
>>
>>   static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>>   {
>> +       u32 _rmid, _eventid;
>>          u64 msr_val;
>>
>>          /*
>> @@ -110,9 +111,15 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
>>           * IA32_QM_CTR.data (bits 61:0) reports the monitored data.
>>           * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
>>           * are error bits.
>> +        * QM_EVTSEL is re-read to detect if this function was interrupted by
>> +        * another call, meaning the QM_CTR value may belong to a different
>> +        * event.
>>           */
>> -       wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
>> -       rdmsrl(MSR_IA32_QM_CTR, msr_val);
>> +       do {
>> +               wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
>> +               rdmsrl(MSR_IA32_QM_CTR, msr_val);
>> +               rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
>> +       } while (eventid != _eventid || rmid != _rmid);
>>
>>          if (msr_val & RMID_VAL_ERROR)
>>                  return -EIO;

> I happen to be tracking the cost of resctrl_arch_rmid_read() calls, so
> I measured the impact of your fix on my AMD EPYC 7B12:
> 
> with both this and the soft RMID series[1] applied:

> The Soft RMID switches contain two __rmid_read() calls, so this
> implies each QM_EVTSEL read-back is around 420 cycles on this AMD
> implementation.

Oooer. I assumed writes might have tedious side-effects but reads would cheap.
I suppose its because another CPU may have modified this value in the meantime.


> Even if you don't agree with my plan to add resctrl_arch_rmid_read()
> calls to context switches, there should be cheaper ways to handle
> this.

Yup, I've swapped this for a sequence counter[0], which should push that cost into the noise.
Anything left will be the cost of the atomics.


Thanks,

James

[0] barely tested:
------------------%<------------------
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 238831d53479..86d3a1b99be6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -16,6 +16,7 @@
   */
  
  #include <linux/module.h>
+#include <linux/percpu.h>
  #include <linux/sizes.h>
  #include <linux/slab.h>
  
@@ -24,6 +25,9 @@
  
  #include "internal.h"
  
+/* Sequence number for writes to IA32 QM_EVTSEL */
+static DEFINE_PER_CPU(u64, qm_evtsel_seq);
+
  struct rmid_entry {
         /*
          * Some architectures's resctrl_arch_rmid_read() needs the CLOSID value
@@ -178,8 +182,7 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
  
  static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
  {
-       u32 _rmid, _eventid;
-       u64 msr_val;
+       u64 msr_val, seq;
  
         /*
          * As per the SDM, when IA32_QM_EVTSEL.EvtID (bits 7:0) is configured
@@ -188,15 +191,16 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
          * IA32_QM_CTR.data (bits 61:0) reports the monitored data.
          * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
          * are error bits.
-        * QM_EVTSEL is re-read to detect if this function was interrupted by
-        * another call, meaning the QM_CTR value may belong to a different
-        * event.
+        * A per-cpu sequence counter is incremented each time QM_EVTSEL is
+        * written. This is used to detect if this function was interrupted by
+        * another call without re-reading the MSRs. Retry the MSR read when
+        * this happens as the QM_CTR value may belong to a different event.
          */
         do {
+               seq = this_cpu_inc_return(qm_evtsel_seq);
                 wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
                 rdmsrl(MSR_IA32_QM_CTR, msr_val);
-               rdmsr(MSR_IA32_QM_EVTSEL, _eventid, _rmid);
-       } while (eventid != _eventid || rmid != _rmid);
+       } while (seq != this_cpu_read(qm_evtsel_seq));
  
         if (msr_val & RMID_VAL_ERROR)
                 return -EIO;

------------------%<------------------