[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dfe2f33c-6103-9699-42f9-73983fa62057@intel.com>
Date: Wed, 20 Oct 2021 11:15:28 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, James Morse <james.morse@....com>,
<x86@...nel.org>, <linux-kernel@...r.kernel.org>
CC: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
<shameerali.kolothum.thodi@...wei.com>,
Jamie Iles <jamie@...iainc.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
<lcherian@...vell.com>, <bobo.shaobowang@...wei.com>,
<tan.shaopeng@...itsu.com>
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
Hi Babu,
On 10/19/2021 4:20 PM, Babu Moger wrote:
> Hi James,
>
> On 10/1/21 11:02 AM, James Morse wrote:
>> __rmid_read() selects the specified eventid and returns the counter
>> value from the msr. The error handling is architecture specific, and
>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>
>> Error handling should be handled by architecture specific code, as
>> a different architecture may have different requirements. MPAM's
>> counters can report that they are 'not ready', requiring a second
>> read after a short delay. This should be hidden from resctrl.
>>
>> Make __rmid_read() the architecture specific function for reading
>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>> handling into it.
>>
>> Signed-off-by: James Morse <james.morse@....com>
>> ---
>> Changes since v1:
>> * Return EINVAL from the impossible case in __mon_event_count() instead
>> of an x86 hardware specific value.
>> ---
>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +--
>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
>> arch/x86/kernel/cpu/resctrl/monitor.c | 42 +++++++++++++++--------
>> include/linux/resctrl.h | 1 +
>> 4 files changed, 31 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 25baacd331e0..c8ca7184c6d9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>
>> mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>
>> - if (rr.val & RMID_VAL_ERROR)
>> + if (rr.err == -EIO)
>> seq_puts(m, "Error\n");
>> - else if (rr.val & RMID_VAL_UNAVAIL)
>> + else if (rr.err == -EINVAL)
>> seq_puts(m, "Unavailable\n");
>> else
>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>
> This patch breaks the earlier fix
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.15-rc6&id=064855a69003c24bd6b473b367d364e418c57625
>
> When the user reads the events on the default monitoring group with
> multiple subgroups, the events on all subgroups are consolidated
> together. In case if the last rmid read was resulted in error then whole
> group will be reported as error. The err field needs to be cleared.
>
> Please add this patch to clear the error.
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 14bc843043da..0e4addf237ec 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
> /* Report error if none of rmid_reads are successful */
> if (ret_val)
> rr->val = ret_val;
> + else
> + rr->err = 0;
> }
>
> /*
>
Good catch, thank you.
Even so, I do not think mon_event_count()'s usage of __mon_event_count()
was taken into account by this patch and needs a bigger rework than the
above fixup. For example, if I understand correctly ret_val is the error
and rr->val no longer expected to contain the error after this patch. So
keeping that assignment to rr->val is not correct.
Reinette
Powered by blists - more mailing lists