[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b5370944-4770-b957-8b42-fcfdea91e079@amd.com>
Date: Tue, 20 Jul 2021 14:15:15 -0500
From: Babu Moger <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de
Cc: fenghua.yu@...el.com, x86@...nel.org, linux-kernel@...r.kernel.org,
hpa@...or.com, pawel.szulik@...el.com,
"Luck, Tony" <tony.luck@...el.com>
Subject: Re: [PATCH] x86/resctrl: Fix default monitoring groups reporting
On 7/19/21 3:43 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/29/2021 11:07 AM, Babu Moger wrote:
>> From: Babu Moger <Babu.Moger@....com>
>>
>> Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
>> getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on
>> the entire filesystem.
>>
>> Steps to reproduce.
>> 1. #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 2. #cd /sys/fs/resctrl/
>>
>> 3. #cat mon_data/mon_L3_00/mbm_total_bytes 23189832
>>
>> 4. #mkdir mon_groups/test1 (create sub monitor group)
>>
>> 5. #cat mon_data/mon_L3_00/mbm_total_bytes Unavailable
>>
>> When a new monitoring group is created, a new RMID is assigned to the new
>> group. But the RMID is not active yet. When the events are read on the new
>> RMID, it is expected to report the status as "Unavailable".
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all sub groups are consolidated together.
>> Currently, if any of the RMID reads report as "Unavailable", then
>> everything will be reported as "Unavailable".
>>
>> Fix the issue by discarding the "Unavailable" reads and reporting all the
>> successful RMID reads. This is not a problem on Intel systesm as Intel
>
> systesm -> systems
Sure.
>
>> reports 0 on Inactive RMIDs.
>>
>> Reported-by: Paweł Szulik <pawel.szulik@...el.com>
>> Signed-off-by: Babu Moger <Babu.Moger@....com>
>> Link:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D213311&data=04%7C01%7Cbabu.moger%40amd.com%7C6931f61de9f34fc2175708d94af5eae3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637623242329908534%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eyJwUSKewq8msA6iv6%2FVrbr9QBLUxZKhyJneRREBfm0%3D&reserved=0
>>
>
> Is a "Fixes" available? If this is specific to AMD then could this be the
> change that enabled AMD systems?
Yes, I will add "Fixes" in my next revision. Hope I will find the proper
commit. I would consider this as a generic fix.
>
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++++++++++++--------------
>> 1 file changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index dbeaa8409313..9573a30c0587 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64
>> cur_msr, unsigned int width)
>> return chunks >>= shift;
>> }
>> -static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>> +static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>> {
>> struct mbm_state *m;
>> u64 chunks, tval;
>> tval = __rmid_read(rmid, rr->evtid);
>> if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
>> - rr->val = tval;
>> - return -EINVAL;
>> + return tval;
>> }
>> switch (rr->evtid) {
>> case QOS_L3_OCCUP_EVENT_ID:
>> @@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct
>> rmid_read *rr)
>> case QOS_L3_MBM_LOCAL_EVENT_ID:
>> m = &rr->d->mbm_local[rmid];
>> break;
>> - default:
>> - /*
>> - * Code would never reach here because
>> - * an invalid event id would fail the __rmid_read.
>> - */
>> - return -EINVAL;
>> }
>> if (rr->first) {
>> @@ -361,23 +354,29 @@ void mon_event_count(void *info)
>> struct rdtgroup *rdtgrp, *entry;
>> struct rmid_read *rr = info;
>> struct list_head *head;
>> + u64 ret_val;
>> rdtgrp = rr->rgrp;
>> - if (__mon_event_count(rdtgrp->mon.rmid, rr))
>> - return;
>> + ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
>> /*
>> - * For Ctrl groups read data from child monitor groups.
>> + * For Ctrl groups read data from child monitor groups and
>> + * add them together. Count events which are read successfully.
>> + * Discard the rmid_read's reporting errors.
>> */
>> head = &rdtgrp->mon.crdtgrp_list;
>> if (rdtgrp->type == RDTCTRL_GROUP) {
>> list_for_each_entry(entry, head, mon.crdtgrp_list) {
>> - if (__mon_event_count(entry->mon.rmid, rr))
>> - return;
>> + if (__mon_event_count(entry->mon.rmid, rr) == 0)
>> + ret_val = 0;
>> }
>> }
>> +
>> + /* Report error if none of rmid_reads are successful */
>> + if (ret_val)
>> + rr->val = ret_val;
>> }
>> /*
>>
>
> With the commit message comments addressed:
> Acked-by: Reinette Chatre <reinette.chatre@...el.com>
Thank You.
>
> Thank you very much
>
> Reinette
Powered by blists - more mailing lists