linux-kernel - Re: [PATCH] x86/resctrl: Fix default monitoring groups reporting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b5370944-4770-b957-8b42-fcfdea91e079@amd.com>
Date:   Tue, 20 Jul 2021 14:15:15 -0500
From:   Babu Moger <babu.moger@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de
Cc:     fenghua.yu@...el.com, x86@...nel.org, linux-kernel@...r.kernel.org,
        hpa@...or.com, pawel.szulik@...el.com,
        "Luck, Tony" <tony.luck@...el.com>
Subject: Re: [PATCH] x86/resctrl: Fix default monitoring groups reporting



On 7/19/21 3:43 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 6/29/2021 11:07 AM, Babu Moger wrote:
>> From: Babu Moger <Babu.Moger@....com>
>>
>> Creating a new sub monitoring group in the root /sys/fs/resctrl leads to
>> getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on
>> the entire filesystem.
>>
>> Steps to reproduce.
>> 1. #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> 2. #cd /sys/fs/resctrl/
>>
>> 3. #cat mon_data/mon_L3_00/mbm_total_bytes 23189832
>>
>> 4. #mkdir mon_groups/test1 (create sub monitor group)
>>
>> 5. #cat mon_data/mon_L3_00/mbm_total_bytes Unavailable
>>
>> When a new monitoring group is created, a new RMID is assigned to the new
>> group. But the RMID is not active yet. When the events are read on the new
>> RMID, it is expected to report the status as "Unavailable".
>>
>> When the user reads the events on the default monitoring group with
>> multiple subgroups, the events on all sub groups are consolidated together.
>> Currently, if any of the RMID reads report as "Unavailable", then
>> everything will be reported as "Unavailable".
>>
>> Fix the issue by discarding the "Unavailable" reads and reporting all the
>> successful RMID reads. This is not a problem on Intel systesm as Intel
> 
> systesm -> systems

Sure.
> 
>> reports 0 on Inactive RMIDs.
>>
>> Reported-by: Paweł Szulik <pawel.szulik@...el.com>
>> Signed-off-by: Babu Moger <Babu.Moger@....com>
>> Link:
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D213311&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C6931f61de9f34fc2175708d94af5eae3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637623242329908534%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=eyJwUSKewq8msA6iv6%2FVrbr9QBLUxZKhyJneRREBfm0%3D&amp;reserved=0
>>
> 
> Is a "Fixes" available? If this is specific to AMD then could this be the
> change that enabled AMD systems?

Yes, I will add "Fixes" in my next revision. Hope I will find the proper
commit. I would consider this as a generic fix.


> 
>> ---
>>   arch/x86/kernel/cpu/resctrl/monitor.c |   27 +++++++++++++--------------
>>   1 file changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index dbeaa8409313..9573a30c0587 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -285,15 +285,14 @@ static u64 mbm_overflow_count(u64 prev_msr, u64
>> cur_msr, unsigned int width)
>>       return chunks >>= shift;
>>   }
>>   -static int __mon_event_count(u32 rmid, struct rmid_read *rr)
>> +static u64 __mon_event_count(u32 rmid, struct rmid_read *rr)
>>   {
>>       struct mbm_state *m;
>>       u64 chunks, tval;
>>         tval = __rmid_read(rmid, rr->evtid);
>>       if (tval & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL)) {
>> -        rr->val = tval;
>> -        return -EINVAL;
>> +        return tval;
>>       }
>>       switch (rr->evtid) {
>>       case QOS_L3_OCCUP_EVENT_ID:
>> @@ -305,12 +304,6 @@ static int __mon_event_count(u32 rmid, struct
>> rmid_read *rr)
>>       case QOS_L3_MBM_LOCAL_EVENT_ID:
>>           m = &rr->d->mbm_local[rmid];
>>           break;
>> -    default:
>> -        /*
>> -         * Code would never reach here because
>> -         * an invalid event id would fail the __rmid_read.
>> -         */
>> -        return -EINVAL;
>>       }
>>         if (rr->first) {
>> @@ -361,23 +354,29 @@ void mon_event_count(void *info)
>>       struct rdtgroup *rdtgrp, *entry;
>>       struct rmid_read *rr = info;
>>       struct list_head *head;
>> +    u64 ret_val;
>>         rdtgrp = rr->rgrp;
>>   -    if (__mon_event_count(rdtgrp->mon.rmid, rr))
>> -        return;
>> +    ret_val = __mon_event_count(rdtgrp->mon.rmid, rr);
>>         /*
>> -     * For Ctrl groups read data from child monitor groups.
>> +     * For Ctrl groups read data from child monitor groups and
>> +     * add them together. Count events which are read successfully.
>> +     * Discard the rmid_read's reporting errors.
>>        */
>>       head = &rdtgrp->mon.crdtgrp_list;
>>         if (rdtgrp->type == RDTCTRL_GROUP) {
>>           list_for_each_entry(entry, head, mon.crdtgrp_list) {
>> -            if (__mon_event_count(entry->mon.rmid, rr))
>> -                return;
>> +            if (__mon_event_count(entry->mon.rmid, rr) == 0)
>> +                ret_val = 0;
>>           }
>>       }
>> +
>> +    /* Report error if none of rmid_reads are successful */
>> +    if (ret_val)
>> +        rr->val = ret_val;
>>   }
>>     /*
>>
> 
> With the commit message comments addressed:
> Acked-by: Reinette Chatre <reinette.chatre@...el.com>

Thank You.

> 
> Thank you very much
> 
> Reinette