[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f294997-7f29-4c15-8c4d-12b016b768cb@intel.com>
Date: Mon, 10 Feb 2025 10:34:32 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: <babu.moger@....com>, <corbet@....net>, <tglx@...utronix.de>,
<mingo@...hat.com>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>,
<tony.luck@...el.com>, <peternewman@...gle.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <paulmck@...nel.org>,
<akpm@...ux-foundation.org>, <thuth@...hat.com>, <rostedt@...dmis.org>,
<xiongwei.song@...driver.com>, <pawan.kumar.gupta@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <jpoimboe@...nel.org>,
<perry.yuan@....com>, <sandipan.das@....com>, <kai.huang@...el.com>,
<xiaoyao.li@...el.com>, <seanjc@...gle.com>, <xin3.li@...el.com>,
<andrew.cooper3@...rix.com>, <ebiggers@...gle.com>,
<mario.limonciello@....com>, <james.morse@....com>,
<tan.shaopeng@...itsu.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
<eranian@...gle.com>
Subject: Re: [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when
mbm_cntr_assign is enabled
Hi Babu,
On 2/10/25 9:27 AM, Moger, Babu wrote:
> On 2/6/25 12:03, Reinette Chatre wrote:
>> On 1/22/25 12:20 PM, Babu Moger wrote:
>>
>>> + * of hardware counter is not considered as an overflow in the
>>> + * next update.
>>> + */
>>> + if (is_mbm_enabled() && r->mon.mbm_cntr_assignable) {
>>> + list_for_each_entry(dom, &r->mon_domains, hdr.list) {
>>> + memset(dom->cntr_cfg, 0,
>>> + sizeof(*dom->cntr_cfg) * r->mon.num_mbm_cntrs);
>>> + if (is_mbm_total_enabled())
>>> + memset(dom->mbm_total, 0,
>>> + sizeof(struct mbm_state) * idx_limit);
>>> + if (is_mbm_local_enabled())
>>> + memset(dom->mbm_local, 0,
>>> + sizeof(struct mbm_state) * idx_limit);
>>> + resctrl_arch_reset_rmid_all(r, dom);
>>> + }
>>> + }
>>> +}
>>
>> I looked back at the previous versions to better understand how this function
>> came about and I do not think it actually solves the problem it aims to solve.
>>
>> rdtgroup_unassign_cntrs() can fail and when it does the counter is not free'd. That
>> leaves a monitoring domain's array with an entry that points to a resource group
>> that no longer exists (unless it is the default resource group) since
>> rdtgroup_unassign_cntrs() does not check the return and proceeds to remove the
>> resource group. mbm_cntr_reset() is called on umount of resctrl but
>> rdtgroup_unassign_cntrs() is called on every group remove and those scenarios
>> are not handled.
>>
>> To address this I believe that I need to go back on a previous request to have
>> resctrl_arch_config_cntr() return an error code. AMD does not need this and
>> it is difficult to predict what will work for MPAM. I originally wanted to be
>> flexible here but this appears to be impractical. With a new requirement that
>> resctrl_arch_config_cntr() always succeeds the counter will in turn always
>> be free'd and not leave dangling pointers. I believe doing so eliminates
>> the need for mbm_cntr_reset() as used in this patch. My apologies for the
>> misdirection. We can re-evaluate these flows if MPAM needs anything different.
>
> So, new requirement is to free the counter even if the
> resctrl_arch_config_cntr() call fails. That way after calling
No. Quoting above: "new requirement that resctrl_arch_config_cntr() always succeeds".
As I see it this will eliminate a lot of error checking on the calling path,
not ignore errors.
Reinette
Powered by blists - more mailing lists