[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71efc804-4ca0-f31b-4485-005b3ad6e4d4@intel.com>
Date: Wed, 20 Oct 2021 13:28:50 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, James Morse <james.morse@....com>,
<x86@...nel.org>, <linux-kernel@...r.kernel.org>
CC: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
<shameerali.kolothum.thodi@...wei.com>,
Jamie Iles <jamie@...iainc.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
<lcherian@...vell.com>, <bobo.shaobowang@...wei.com>,
<tan.shaopeng@...itsu.com>
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
Hi Babu,
On 10/20/2021 12:22 PM, Babu Moger wrote:
> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>> __rmid_read() selects the specified eventid and returns the counter
>>>> value from the msr. The error handling is architecture specific, and
>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>
>>>> Error handling should be handled by architecture specific code, as
>>>> a different architecture may have different requirements. MPAM's
>>>> counters can report that they are 'not ready', requiring a second
>>>> read after a short delay. This should be hidden from resctrl.
>>>>
>>>> Make __rmid_read() the architecture specific function for reading
>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>> handling into it.
>>>>
>>>> Signed-off-by: James Morse <james.morse@....com>
>>>> ---
>>>> Changes since v1:
>>>> * Return EINVAL from the impossible case in __mon_event_count() instead
>>>> of an x86 hardware specific value.
>>>> ---
>>>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +--
>>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
>>>> arch/x86/kernel/cpu/resctrl/monitor.c | 42 +++++++++++++++--------
>>>> include/linux/resctrl.h | 1 +
>>>> 4 files changed, 31 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>> *arg)
>>>> mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>> - if (rr.val & RMID_VAL_ERROR)
>>>> + if (rr.err == -EIO)
>>>> seq_puts(m, "Error\n");
>>>> - else if (rr.val & RMID_VAL_UNAVAIL)
>>>> + else if (rr.err == -EINVAL)
>>>> seq_puts(m, "Unavailable\n");
>>>> else
>>>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>
>>> This patch breaks the earlier fix
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&reserved=0
>>>
>>>
>>> When the user reads the events on the default monitoring group with
>>> multiple subgroups, the events on all subgroups are consolidated
>>> together. In case if the last rmid read was resulted in error then whole
>>> group will be reported as error. The err field needs to be cleared.
>>>
>>> Please add this patch to clear the error.
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index 14bc843043da..0e4addf237ec 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>>> /* Report error if none of rmid_reads are successful */
>>> if (ret_val)
>>> rr->val = ret_val;
>>> + else
>>> + rr->err = 0;
>>> }
>>>
>>> /*
>>>
>>
>> Good catch, thank you.
>>
>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>> was taken into account by this patch and needs a bigger rework than the
>> above fixup. For example, if I understand correctly ret_val is the error
>> and rr->val no longer expected to contain the error after this patch. So
>> keeping that assignment to rr->val is not correct.
>
> Yes. You are right. rr->val is not expected to contain the error.
> Hopefully, this should help.
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 14bc843043da..105d972cc511 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -441,9 +441,9 @@ void mon_event_count(void *info)
> }
> }
>
> - /* Report error if none of rmid_reads are successful */
> - if (ret_val)
> - rr->val = ret_val;
> + /* Clear the error if at least one of the rmid reads succeed */
> + if (ret_val == 0)
> + rr->err = 0;
> }
>
> /*
>
Yes, this looks good. If the first __mon_event_count() succeeds but a
following one fails then the data still needs to be reported so the
error code needs to be fixed up afterwards and cannot be done inside
__mon_event_count(). Thank you very much.
Reinette
Powered by blists - more mailing lists