lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dd1031c7-6134-5dfe-6113-3f0325f31663@intel.com>
Date:   Fri, 16 Dec 2022 14:29:30 -0800
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Peter Newman <peternewman@...gle.com>
CC:     Fenghua Yu <fenghua.yu@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        James Morse <james.morse@....com>,
        Shaopeng Tan <tan.shaopeng@...itsu.com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        <linux-kernel@...r.kernel.org>, <eranian@...gle.com>,
        Babu Moger <Babu.Moger@....com>
Subject: Re: [PATCH] x86/resctrl: Fix event counts regression in reused RMIDs

Hi Peter,

On 12/16/2022 5:54 AM, Peter Newman wrote:
> On Wed, Dec 14, 2022 at 8:17 PM Reinette Chatre
> <reinette.chatre@...el.com> wrote:
>> On 12/14/2022 6:21 AM, Peter Newman wrote:
>>> mbm_state is arch-independent, so I think putting it here would require
>>> the MPAM version to copy this and for get_mbm_state() to be exported.
>>
>> You are correct, it is arch independent ... so every arch is expected to
>> have it.
>> I peeked at your series and that looks good also - having cleanup done in
>> a central place helps to avoid future mistakes.
>>
>>>> am = get_arch_mbm_state(hw_dom, rmid, eventid);
>>>> if (am) {
>>>> memset(am, 0, sizeof(*am));
>>>> /* Record any initial, non-zero count value. */
>>>> ret = __rmid_read(rmid, eventid, &val);
>>>> if (!ret)
>>>> am->prev_msr = val;
>>>> }
>>>>
>>>> }
>>>>
>>>> Having this would be helpful as reference to Babu's usage.
>>>
>>> His usage looks a little different.
>>>
>>> According to the comment in Babu's patch:
>>>
>>> https://lore.kernel.org/lkml/166990903030.17806.5106229901730558377.stgit@bmoger-ubuntu/
>>>
>>> + /*
>>> + * When an Event Configuration is changed, the bandwidth counters
>>> + * for all RMIDs and Events will be cleared by the hardware. The
>>> + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
>>> + * every RMID on the next read to any event for every RMID.
>>> + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
>>> + * cleared while it is tracked by the hardware. Clear the
>>> + * mbm_local and mbm_total counts for all the RMIDs.
>>> + */
>>> + resctrl_arch_reset_rmid_all(r, d);
>>>
>>> If all the hardware counters are zeroed as the comment suggests, then
>>> leaving am->prev_msr zero seems correct. __rmid_read() would likely
>>> return an error anyways. The bug I was addressing was one of reusing
>>> an RMID which had not been reset.
>>
>> You are correct, but there are two things to keep in mind though:
>> * the change from which you copied the above snippet introduces a new
>>   _generic_ utility far away from this call site. It is thus reasonable to
>>   assume that this utility should work for all use cases, not just the one
>>   for which it is created. Since there are no other use cases at this time,
>>   this may be ok, but I think at minimum the utility will benefit from
>>   a snippet indicating the caveats of its use as a heads up to any future users.
>> * the utility does not clear struct mbm_state contents. Again, this is ok
>>   for this usage since AMD does not support the software controller but
>>   as far as a generic utility goes the usage should be clear to avoid
>>   traps for future changes.
> 
> To this end, would it help if I pulled the rr->first case into a
> separate function like this:
> 
> -               resctrl_arch_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
> -               m = get_mbm_state(rr->d, rmid, rr->evtid);
> -               if (m)
> -                       memset(m, 0, sizeof(struct mbm_state));
> +               resctrl_reset_rmid(rr->r, rr->d, rmid, rr->evtid);
> 
> I'm open to suggestions on the name.

This email thread started to talk about two generic utilities, the one relevant
to this fix (resctrl_arch_reset_rmid()) and the one being created by Babu 
(resctrl_arch_reset_rmid_all()). Focusing on the one related to this fix I do
think the way in which the utility is used in V2 makes it clear how cleanup
should be done. I could have been more explicit but that is what I meant earlier
when saying that the way that the cleanup is done in a central place looks good.
Any future scenario would have a good reference to follow and if needed a new
utility can be created at that time. 

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ