lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Oct 2021 11:15:28 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Babu Moger <babu.moger@....com>, James Morse <james.morse@....com>,
        <x86@...nel.org>, <linux-kernel@...r.kernel.org>
CC:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        <shameerali.kolothum.thodi@...wei.com>,
        Jamie Iles <jamie@...iainc.com>,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        <lcherian@...vell.com>, <bobo.shaobowang@...wei.com>,
        <tan.shaopeng@...itsu.com>
Subject: Re: [PATCH v2 17/23] x86/resctrl: Abstract __rmid_read()

Hi Babu,

On 10/19/2021 4:20 PM, Babu Moger wrote:
> Hi James,
> 
> On 10/1/21 11:02 AM, James Morse wrote:
>> __rmid_read() selects the specified eventid and returns the counter
>> value from the msr. The error handling is architecture specific, and
>> handled by the callers, rdtgroup_mondata_show() and __mon_event_count().
>>
>> Error handling should be handled by architecture specific code, as
>> a different architecture may have different requirements. MPAM's
>> counters can report that they are 'not ready', requiring a second
>> read after a short delay. This should be hidden from resctrl.
>>
>> Make __rmid_read() the architecture specific function for reading
>> a counter. Rename it resctrl_arch_rmid_read() and move the error
>> handling into it.
>>
>> Signed-off-by: James Morse <james.morse@....com>
>> ---
>> Changes since v1:
>>   * Return EINVAL from the impossible case in __mon_event_count() instead
>>     of an x86 hardware specific value.
>> ---
>>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  4 +--
>>   arch/x86/kernel/cpu/resctrl/internal.h    |  2 +-
>>   arch/x86/kernel/cpu/resctrl/monitor.c     | 42 +++++++++++++++--------
>>   include/linux/resctrl.h                   |  1 +
>>   4 files changed, 31 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 25baacd331e0..c8ca7184c6d9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -579,9 +579,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>   
>>   	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>>   
>> -	if (rr.val & RMID_VAL_ERROR)
>> +	if (rr.err == -EIO)
>>   		seq_puts(m, "Error\n");
>> -	else if (rr.val & RMID_VAL_UNAVAIL)
>> +	else if (rr.err == -EINVAL)
>>   		seq_puts(m, "Unavailable\n");
>>   	else
>>   		seq_printf(m, "%llu\n", rr.val * hw_res->mon_scale);
> 
> This patch breaks the earlier fix
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.15-rc6&id=064855a69003c24bd6b473b367d364e418c57625
> 
> When the user reads the events on the default monitoring group with
> multiple subgroups, the events on all subgroups are consolidated
> together. In case if the last rmid read was resulted in error then whole
> group will be reported as error. The err field needs to be cleared.
> 
> Please add this patch to clear the error.
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
> b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 14bc843043da..0e4addf237ec 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -444,6 +444,8 @@ void mon_event_count(void *info)
>          /* Report error if none of rmid_reads are successful */
>          if (ret_val)
>                  rr->val = ret_val;
> +       else
> +               rr->err  = 0;
>   }
> 
>   /*
> 

Good catch, thank you.

Even so, I do not think mon_event_count()'s usage of __mon_event_count() 
was taken into account by this patch and needs a bigger rework than the 
above fixup. For example, if I understand correctly ret_val is the error 
and rr->val no longer expected to contain the error after this patch. So 
keeping that assignment to rr->val is not correct.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ