linux-kernel - Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <438e6316-3a42-7f96-44c4-528f905eb832@amd.com>
Date:   Wed, 27 Oct 2021 13:59:13 -0500
From:   Babu Moger <babu.moger@....com>
To:     James Morse <james.morse@....com>,
        Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        shameerali.kolothum.thodi@...wei.com,
        Jamie Iles <jamie@...iainc.com>,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        lcherian@...vell.com, bobo.shaobowang@...wei.com,
        tan.shaopeng@...itsu.com
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()

Hi James,

On 10/27/21 11:50 AM, James Morse wrote:
> Hi Reinette, Babu,
> 
> On 20/10/2021 21:28, Reinette Chatre wrote:
>> On 10/20/2021 12:22 PM, Babu Moger wrote:
>>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>>> value from the msr. The error handling is architecture specific, and
>>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>>
>>>>>> Error handling should be handled by architecture specific code, as
>>>>>> a different architecture may have different requirements. MPAM's
>>>>>> counters can report that they are 'not ready', requiring a second
>>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>>
>>>>>> Make __rmid_read() the architecture specific function for reading
>>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>>> handling into it.
> 
> 
>>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>>> *arg)
>>>>>>          mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>>>    -    if (rr.val & RMID_VAL_ERROR)
>>>>>> +    if (rr.err == -EIO)
>>>>>>            seq_puts(m, "Error\n");
>>>>>> -    else if (rr.val & RMID_VAL_UNAVAIL)
>>>>>> +    else if (rr.err == -EINVAL)
>>>>>>            seq_puts(m, "Unavailable\n");
>>>>>>        else
>>>>>>            seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>>
>>>>> This patch breaks the earlier fix
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&amp;data=04%7C01%7Cbabu.moger%40amd.com%7C00eaab44815947ce7eb908d99969e584%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709502411367349%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4udUc%2BEurWdC%2BAPQFs2eG0aVbsv3SnIXcEyRj081hxk%3D&amp;reserved=0
> 
> Aha!
> 
> 
>>>>> When the user reads the events on the default monitoring group with
>>>>> multiple subgroups, the events on all subgroups are consolidated
>>>>> together. In case if the last rmid read was resulted in error then whole
>>>>> group will be reported as error. The err field needs to be cleared.
>>>>>
>>>>> Please add this patch to clear the error.
> 
>>>> Good catch, thank you.
>>>>
>>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>>> was taken into account by this patch and needs a bigger rework than the
>>>> above fixup. For example, if I understand correctly ret_val is the error
>>>> and rr->val no longer expected to contain the error after this patch. So
>>>> keeping that assignment to rr->val is not correct.
>>>
>>> Yes. You are right. rr->val is not expected to contain the error.
>>> Hopefully, this should help.
> 
>> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
>> then the data still needs to be reported so the error code needs to be fixed up afterwards
>> and cannot be done inside __mon_event_count(). Thank you very much.
> 
> Thanks both! I should have worked this out when splitting msr_val into two values, which
> end up getting set the same.
> 
> I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
> I've replaced the rr->val chunk with:
> |	/*
> |	 * __mon_event_count() calls for newly created monitor groups may
> |	 * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
> |	 * If the first call for the control group succeed, discard any error
> |	 * set by reads of monitor groups.
> |	 */
> |	if (ret_val == 0)
> |		rr->err = 0;

Looks good.
Thanks
Babu