lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 20 Oct 2021 13:28:50 -0700 From: Reinette Chatre <reinette.chatre@...el.com> To: Babu Moger <babu.moger@....com>, James Morse <james.morse@....com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org> CC: Fenghua Yu <fenghua.yu@...el.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, H Peter Anvin <hpa@...or.com>, <shameerali.kolothum.thodi@...wei.com>, Jamie Iles <jamie@...iainc.com>, D Scott Phillips OS <scott@...amperecomputing.com>, <lcherian@...vell.com>, <bobo.shaobowang@...wei.com>, <tan.shaopeng@...itsu.com> Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read() Hi Babu, On 10/20/2021 12:22 PM, Babu Moger wrote: > On 10/20/21 1:15 PM, Reinette Chatre wrote: >> On 10/19/2021 4:20 PM, Babu Moger wrote: >>> On 10/1/21 11:02 AM, James Morse wrote: >>>> __rmid_read() selects the specified eventid and returns the counter >>>> value from the msr. The error handling is architecture specific, and >>>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). >>>> >>>> Error handling should be handled by architecture specific code, as >>>> a different architecture may have different requirements. MPAM's >>>> counters can report that they are 'not ready', requiring a second >>>> read after a short delay. This should be hidden from resctrl. >>>> >>>> Make __rmid_read() the architecture specific function for reading >>>> a counter. Rename it resctrl_arch_rmid_read() and move the error >>>> handling into it. >>>> >>>> Signed-off-by: James Morse <james.morse@....com> >>>> --- >>>> Changes since v1: >>>> * Return EINVAL from the impossible case in __mon_event_count() instead >>>> of an x86 hardware specific value. >>>> --- >>>> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 +-- >>>> arch/x86/kernel/cpu/resctrl/internal.h | 2 +- >>>> arch/x86/kernel/cpu/resctrl/monitor.c | 42 +++++++++++++++-------- >>>> include/linux/resctrl.h | 1 + >>>> 4 files changed, 31 insertions(+), 18 deletions(-) >>>> >>>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >>>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >>>> index 25baacd331e0..c8ca7184c6d9 100644 >>>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >>>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c >>>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void >>>> *arg) >>>> mon_event_read(&rr, r, d, rdtgrp, evtid, false); >>>> - if (rr.val & RMID_VAL_ERROR) >>>> + if (rr.err == -EIO) >>>> seq_puts(m, "Error\n"); >>>> - else if (rr.val & RMID_VAL_UNAVAIL) >>>> + else if (rr.err == -EINVAL) >>>> seq_puts(m, "Unavailable\n"); >>>> else >>>> seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale); >>> >>> This patch breaks the earlier fix >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fh%3Dv5.15-rc6%26id%3D064855a69003c24bd6b473b367d364e418c57625&data=04%7C01%7Cbabu.moger%40amd.com%7C85219a5827114935cdaa08d993f59fa0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637703505420472920%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yP8awDgGGZ%2BWj5ZItdTNJItTVuK828yGnibwq%2BrVaf0%3D&reserved=0 >>> >>> >>> When the user reads the events on the default monitoring group with >>> multiple subgroups, the events on all subgroups are consolidated >>> together. In case if the last rmid read was resulted in error then whole >>> group will be reported as error. The err field needs to be cleared. >>> >>> Please add this patch to clear the error. >>> >>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c >>> b/arch/x86/kernel/cpu/resctrl/monitor.c >>> index 14bc843043da..0e4addf237ec 100644 >>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c >>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c >>> @@ -444,6 +444,8 @@ void mon_event_count(void *info) >>> /* Report error if none of rmid_reads are successful */ >>> if (ret_val) >>> rr->val = ret_val; >>> + else >>> + rr->err = 0; >>> } >>> >>> /* >>> >> >> Good catch, thank you. >> >> Even so, I do not think mon_event_count()'s usage of __mon_event_count() >> was taken into account by this patch and needs a bigger rework than the >> above fixup. For example, if I understand correctly ret_val is the error >> and rr->val no longer expected to contain the error after this patch. So >> keeping that assignment to rr->val is not correct. > > Yes. You are right. rr->val is not expected to contain the error. > Hopefully, this should help. > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c > b/arch/x86/kernel/cpu/resctrl/monitor.c > index 14bc843043da..105d972cc511 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -441,9 +441,9 @@ void mon_event_count(void *info) > } > } > > - /* Report error if none of rmid_reads are successful */ > - if (ret_val) > - rr->val = ret_val; > + /* Clear the error if at least one of the rmid reads succeed */ > + if (ret_val == 0) > + rr->err = 0; > } > > /* > Yes, this looks good. If the first __mon_event_count() succeeds but a following one fails then the data still needs to be reported so the error code needs to be fixed up afterwards and cannot be done inside __mon_event_count(). Thank you very much. Reinette
Powered by blists - more mailing lists