[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81826a4f-c3de-787f-8059-4808815b4800@amd.com>
Date: Wed, 20 Oct 2021 14:22:01 -0500
From: Babu Moger <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
James Morse <james.morse@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
shameerali.kolothum.thodi@...wei.com,
Jamie Iles <jamie@...iainc.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
lcherian@...vell.com, bobo.shaobowang@...wei.com,
tan.shaopeng@...itsu.com
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
Hi Reinette,
On 10/20/21 1:15 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/19/2021 4:20 PM, Babu Moger wrote:
>> Hi James,
>>
>> On 10/1/21 11:02 AM, James Morse wrote:
>>> __rmid_read() selects the specified eventid and returns the counter
>>> value from the msr. The error handling is architecture specific, and
>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>
>>> Error handling should be handled by architecture specific code, as
>>> a different architecture may have different requirements. MPAM's
>>> counters can report that they are 'not ready', requiring a second
>>> read after a short delay. This should be hidden from resctrl.
>>>
>>> Make __rmid_read() the architecture specific function for reading
>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>> handling into it.
>>>
>>> Signed-off-by: James Morse <james.morse@....com>
>>> ---
>>> Changes since v1:
>>> * Return EINVAL from the impossible case in __mon_event_count() instead
>>> of an x86 hardware specific value.
>>> ---
>>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +--
>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
>>> arch/x86/kernel/cpu/resctrl/monitor.c | 42 +++++++++++++++--------
>>> include/linux/resctrl.h | 1 +
>>> 4 files changed, 31 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> index 25baacd331e0..c8ca7184c6d9 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>> *arg)
>>> mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>> - if (rr.val & RMID_VAL_ERROR)
>>> + if (rr.err == -EIO)
>>> seq_puts(m, "Error\n");
>>> - else if (rr.val & RMID_VAL_UNAVAIL)
>>> + else if (rr.err == -EINVAL)
>>> seq_puts(m, "Unavailable\n");
>>> else
>>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>
>> This patch breaks the earlier fix
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&reserved=0
>>
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all subgroups are consolidated
>> together. In case if the last rmid read was resulted in error then whole
>> group will be reported as error. The err field needs to be cleared.
>>
>> Please add this patch to clear the error.
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 14bc843043da..0e4addf237ec 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>> /* Report error if none of rmid_reads are successful */
>> if (ret_val)
>> rr->val = ret_val;
>> + else
>> + rr->err = 0;
>> }
>>
>> /*
>>
>
> Good catch, thank you.
>
> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
> was taken into account by this patch and needs a bigger rework than the
> above fixup. For example, if I understand correctly ret_val is the error
> and rr->val no longer expected to contain the error after this patch. So
> keeping that assignment to rr->val is not correct.
Yes. You are right. rr->val is not expected to contain the error.
Hopefully, this should help.
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 14bc843043da..105d972cc511 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -441,9 +441,9 @@ void mon_event_count(void *info)
}
}
- /* Report error if none of rmid_reads are successful */
- if (ret_val)
- rr->val = ret_val;
+ /* Clear the error if at least one of the rmid reads succeed */
+ if (ret_val == 0)
+ rr->err = 0;
}
/*
Powered by blists - more mailing lists