[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <859c5573-b025-7754-94bf-294c7da3abdc@arm.com>
Date: Wed, 27 Oct 2021 17:50:31 +0100
From: James Morse <james.morse@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
Babu Moger <babu.moger@....com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
shameerali.kolothum.thodi@...wei.com,
Jamie Iles <jamie@...iainc.com>,
D Scott Phillips OS <scott@...amperecomputing.com>,
lcherian@...vell.com, bobo.shaobowang@...wei.com,
tan.shaopeng@...itsu.com
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()
Hi Reinette, Babu,
On 20/10/2021 21:28, Reinette Chatre wrote:
> On 10/20/2021 12:22 PM, Babu Moger wrote:
>> On 10/20/21 1:15 PM, Reinette Chatre wrote:
>>> On 10/19/2021 4:20 PM, Babu Moger wrote:
>>>> On 10/1/21 11:02 AM, James Morse wrote:
>>>>> __rmid_read() selects the specified eventid and returns the counter
>>>>> value from the msr. The error handling is architecture specific, and
>>>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>>>>
>>>>> Error handling should be handled by architecture specific code, as
>>>>> a different architecture may have different requirements. MPAM's
>>>>> counters can report that they are 'not ready', requiring a second
>>>>> read after a short delay. This should be hidden from resctrl.
>>>>>
>>>>> Make __rmid_read() the architecture specific function for reading
>>>>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>>>>> handling into it.
>>>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> index 25baacd331e0..c8ca7184c6d9 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>>>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void
>>>>> *arg)
>>>>> mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>>>> - if (rr.val & RMID_VAL_ERROR)
>>>>> + if (rr.err == -EIO)
>>>>> seq_puts(m, "Error\n");
>>>>> - else if (rr.val & RMID_VAL_UNAVAIL)
>>>>> + else if (rr.err == -EINVAL)
>>>>> seq_puts(m, "Unavailable\n");
>>>>> else
>>>>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
>>>>
>>>> This patch breaks the earlier fix
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&reserved=0
Aha!
>>>> When the user reads the events on the default monitoring group with
>>>> multiple subgroups, the events on all subgroups are consolidated
>>>> together. In case if the last rmid read was resulted in error then whole
>>>> group will be reported as error. The err field needs to be cleared.
>>>>
>>>> Please add this patch to clear the error.
>>> Good catch, thank you.
>>>
>>> Even so, I do not think mon_event_count()'s usage of __mon_event_count()
>>> was taken into account by this patch and needs a bigger rework than the
>>> above fixup. For example, if I understand correctly ret_val is the error
>>> and rr->val no longer expected to contain the error after this patch. So
>>> keeping that assignment to rr->val is not correct.
>>
>> Yes. You are right. rr->val is not expected to contain the error.
>> Hopefully, this should help.
> Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails
> then the data still needs to be reported so the error code needs to be fixed up afterwards
> and cannot be done inside __mon_event_count(). Thank you very much.
Thanks both! I should have worked this out when splitting msr_val into two values, which
end up getting set the same.
I think the 'Unavailable' issue is subtle enough that it deserves a block comment.
I've replaced the rr->val chunk with:
| /*
| * __mon_event_count() calls for newly created monitor groups may
| * report -EINVAL/Unavailable if the monitor hasn't seen any traffic.
| * If the first call for the control group succeed, discard any error
| * set by reads of monitor groups.
| */
| if (ret_val == 0)
| rr->err = 0;
Thanks.
James
Powered by blists - more mailing lists