[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <438e6316-3a42-7f96-44c4-528f905eb832@amd.com>
Date: Wed, 27 Oct 2021 13:59:13 -0500
From: Babu Moger <babu.moger@....com>
To: James Morse <james.morse@....com>,
Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
shameerali.kolothum.thodi@...wei.com,
Jamie Iles <jamie@...iainc.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
lcherian@...vell.com, bobo.shaobowang@...wei.com,
tan.shaopeng@...itsu.com
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
Hi James,
On 10/27/21 11:50 AM, James Morse wrote:
> Hi Reinette, Babu,
>
> On 20/10/2021 21:28, Reinette Chatre wrote:
>> On 10/20/2021 12:22 PM, Babu Moger wrote:
>>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>>> value from the msr. The error handling is architecture specific, and
>>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>>
>>>>>> Error handling should be handled by architecture specific code, as
>>>>>> a different architecture may have different requirements. MPAM's
>>>>>> counters can report that they are 'not ready', requiring a second
>>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>>
>>>>>> Make __rmid_read() the architecture specific function for reading
>>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>>> handling into it.
>
>
>>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>>> *arg)
>>>>>> mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>>> - if (rr.val & RMID_VAL_ERROR)
>>>>>> + if (rr.err == -EIO)
>>>>>> seq_puts(m, "Error\n");
>>>>>> - else if (rr.val & RMID_VAL_UNAVAIL)
>>>>>> + else if (rr.err == -EINVAL)
>>>>>> seq_puts(m, "Unavailable\n");
>>>>>> else
>>>>>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>>
>>>>> This patch breaks the earlier fix
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C00eaab44815947ce7eb908d99969e584%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637709502411367349%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4udUc%2BEurWdC%2BAPQFs2eG0aVbsv3SnIXcEyRj081hxk%3D&reserved=0
>
> Aha!
>
>
>>>>> When the user reads the events on the default monitoring group with
>>>>> multiple subgroups, the events on all subgroups are consolidated
>>>>> together. In case if the last rmid read was resulted in error then whole
>>>>> group will be reported as error. The err field needs to be cleared.
>>>>>
>>>>> Please add this patch to clear the error.
>
>>>> Good catch, thank you.
>>>>
>>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>>> was taken into account by this patch and needs a bigger rework than the
>>>> above fixup. For example, if I understand correctly ret_val is the error
>>>> and rr->val no longer expected to contain the error after this patch. So
>>>> keeping that assignment to rr->val is not correct.
>>>
>>> Yes. You are right. rr->val is not expected to contain the error.
>>> Hopefully, this should help.
>
>> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
>> then the data still needs to be reported so the error code needs to be fixed up afterwards
>> and cannot be done inside __mon_event_count(). Thank you very much.
>
> Thanks both! I should have worked this out when splitting msr_val into two values, which
> end up getting set the same.
>
> I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
> I've replaced the rr->val chunk with:
> | /*
> | * __mon_event_count() calls for newly created monitor groups may
> | * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
> | * If the first call for the control group succeed, discard any error
> | * set by reads of monitor groups.
> | */
> | if (ret_val == 0)
> | rr->err = 0;
Looks good.
Thanks
Babu
Powered by blists - more mailing lists