[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3061ec99-c418-b1a3-7975-8a6b9d0bdd14@intel.com>
Date: Tue, 3 Dec 2024 20:16:23 -0800
From: Fenghua Yu <fenghua.yu@...el.com>
To: Babu Moger <babu.moger@....com>, <corbet@....net>,
<reinette.chatre@...el.com>, <tglx@...utronix.de>, <mingo@...hat.com>,
<bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <thuth@...hat.com>,
<paulmck@...nel.org>, <rostedt@...dmis.org>, <akpm@...ux-foundation.org>,
<xiongwei.song@...driver.com>, <pawan.kumar.gupta@...ux.intel.com>,
<daniel.sneddon@...ux.intel.com>, <perry.yuan@....com>,
<sandipan.das@....com>, <kai.huang@...el.com>, <xiaoyao.li@...el.com>,
<seanjc@...gle.com>, <jithu.joseph@...el.com>, <brijesh.singh@....com>,
<xin3.li@...el.com>, <ebiggers@...gle.com>, <andrew.cooper3@...rix.com>,
<mario.limonciello@....com>, <james.morse@....com>,
<tan.shaopeng@...itsu.com>, <tony.luck@...el.com>,
<vikas.shivappa@...ux.intel.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <peternewman@...gle.com>,
<maciej.wieczor-retman@...el.com>, <eranian@...gle.com>,
<jpoimboe@...nel.org>, <thomas.lendacky@....com>
Subject: Re: [PATCH v9 20/26] x86/resctrl: Auto assign/unassign counters when
mbm_cntr_assign is enabled
Hi, Babu,
On 10/29/24 16:21, Babu Moger wrote:
> Assign/unassign counters on resctrl group creation/deletion. Two counters
> are required per group, one for MBM total event and one for MBM local
> event.
>
> There are a limited number of counters available for assignment. If these
> counters are exhausted, the kernel will display the error message: "Out of
> MBM assignable counters". However, it is not necessary to fail the
> creation of a group due to assignment failures. Users have the flexibility
> to modify the assignments at a later time.
>
> Signed-off-by: Babu Moger <babu.moger@....com>
> ---
> v9: Changed rdtgroup_assign_cntrs() and rdtgroup_unassign_cntrs() to return void.
> Updated couple of rdtgroup_unassign_cntrs() calls properly.
> Updated function comments.
>
> v8: Renamed rdtgroup_assign_grp to rdtgroup_assign_cntrs.
> Renamed rdtgroup_unassign_grp to rdtgroup_unassign_cntrs.
> Fixed the problem with unassigning the child MON groups of CTRL_MON group.
>
> v7: Reworded the commit message.
> Removed the reference of ABMC with mbm_cntr_assign.
> Renamed the function rdtgroup_assign_cntrs to rdtgroup_assign_grp.
>
> v6: Removed the redundant comments on all the calls of
> rdtgroup_assign_cntrs. Updated the commit message.
> Dropped printing error message on every call of rdtgroup_assign_cntrs.
>
> v5: Removed the code to enable/disable ABMC during the mount.
> That will be another patch.
> Added arch callers to get the arch specific data.
> Renamed fuctions to match the other abmc function.
> Added code comments for assignment failures.
>
> v4: Few name changes based on the upstream discussion.
> Commit message update.
>
> v3: This is a new patch. Patch addresses the upstream comment to enable
> ABMC feature by default if the feature is available.
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 61 +++++++++++++++++++++++++-
> 1 file changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b0cce3dfd062..a8d21b0b2054 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2932,6 +2932,46 @@ static void schemata_list_destroy(void)
> }
> }
>
> +/*
> + * Called when a new group is created. If "mbm_cntr_assign" mode is enabled,
> + * counters are automatically assigned. Each group can accommodate two counters:
> + * one for the total event and one for the local event. Assignments may fail
> + * due to the limited number of counters. However, it is not necessary to fail
> + * the group creation and thus no failure is returned. Users have the option
> + * to modify the counter assignments after the group has been created.
> + */
> +static void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> + return;
> +
> + if (is_mbm_total_enabled())
> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
In this code path,
resctrl_mkdir()->resctrl_mkdir_ctrl_mon()->rdtgroup_assign_cntrs()->rdtgroup_assign_cntr_event()
CPUs are not protected by read lock while rdtgroup_assign_cntr_event()
walks r->mon_domains and run assing counters code on CPUs in the
domains. Without CPU protection, r->mon_domains may race with CPU hotplug.
In another patch (i.e. rdt_get_tree()), rdtgroup_assign_cntrs() is
protected by cpus_read_lock()/unlock().
So maybe define two helpers:
// Called when caller takes cpus_read_lock()
rdtgroup_assign_cntrs_locked()
{
lockdep_assert_cpus_held();
then the current rdtgroup_assign_cntrs() code
}
// Called when caller doesn't take cpus_read_lock()
rdtgroup_assign_cntrs()
{
cpus_read_lock();
rdtgroup_assign_cntrs_locked();
cpus_read_unlock();
}
> +
> + if (is_mbm_local_enabled())
> + rdtgroup_assign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> +}
> +
> +/*
> + * Called when a group is deleted. Counters are unassigned if it was in
> + * assigned state.
> + */
> +static void rdtgroup_unassign_cntrs(struct rdtgroup *rdtgrp)
> +{
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> + if (!resctrl_arch_mbm_cntr_assign_enabled(r))
> + return;
> +
> + if (is_mbm_total_enabled())
> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
> + if (is_mbm_local_enabled())
> + rdtgroup_unassign_cntr_event(r, rdtgrp, NULL, QOS_L3_MBM_LOCAL_EVENT_ID);
> +}
> +
Seems rdtgroup_unassign_cntrs() is always protected by
cpus_read_lock()/unlock(). So it's good.
> static int rdt_get_tree(struct fs_context *fc)
> {
> struct rdt_fs_context *ctx = rdt_fc2context(fc);
> @@ -2991,6 +3031,8 @@ static int rdt_get_tree(struct fs_context *fc)
> if (ret < 0)
> goto out_mongrp;
> rdtgroup_default.mon.mon_data_kn = kn_mondata;
> +
> + rdtgroup_assign_cntrs(&rdtgroup_default);
In this case, cpus_read_lock() was called earlier. Change to
rdtgroup_assign_cntrs_locked().
> }
>
> ret = rdt_pseudo_lock_init();
> @@ -3021,8 +3063,10 @@ static int rdt_get_tree(struct fs_context *fc)
> out_psl:
> rdt_pseudo_lock_release();
> out_mondata:
> - if (resctrl_arch_mon_capable())
> + if (resctrl_arch_mon_capable()) {
> + rdtgroup_unassign_cntrs(&rdtgroup_default);
> kernfs_remove(kn_mondata);
> + }
> out_mongrp:
> if (resctrl_arch_mon_capable())
> kernfs_remove(kn_mongrp);
> @@ -3201,6 +3245,7 @@ static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
>
> head = &rdtgrp->mon.crdtgrp_list;
> list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
> + rdtgroup_unassign_cntrs(sentry);
> free_rmid(sentry->closid, sentry->mon.rmid);
> list_del(&sentry->mon.crdtgrp_list);
>
> @@ -3241,6 +3286,8 @@ static void rmdir_all_sub(void)
> cpumask_or(&rdtgroup_default.cpu_mask,
> &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
>
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>
> kernfs_remove(rdtgrp->kn);
> @@ -3272,6 +3319,7 @@ static void rdt_kill_sb(struct super_block *sb)
> for_each_alloc_capable_rdt_resource(r)
> reset_all_ctrls(r);
> rmdir_all_sub();
> + rdtgroup_unassign_cntrs(&rdtgroup_default);
> rdt_pseudo_lock_release();
> rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> schemata_list_destroy();
> @@ -3280,6 +3328,7 @@ static void rdt_kill_sb(struct super_block *sb)
> resctrl_arch_disable_alloc();
> if (resctrl_arch_mon_capable())
> resctrl_arch_disable_mon();
> +
Unnecessary change.
> resctrl_mounted = false;
> kernfs_kill_sb(sb);
> mutex_unlock(&rdtgroup_mutex);
> @@ -3871,6 +3920,8 @@ static int rdtgroup_mkdir_mon(struct kernfs_node *parent_kn,
> goto out_unlock;
> }
>
> + rdtgroup_assign_cntrs(rdtgrp);
> + > kernfs_activate(rdtgrp->kn);
>
> /*
> @@ -3915,6 +3966,8 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> if (ret)
> goto out_closid_free;
>
> + rdtgroup_assign_cntrs(rdtgrp);
> +
> kernfs_activate(rdtgrp->kn);
>
> ret = rdtgroup_init_alloc(rdtgrp);
> @@ -3940,6 +3993,7 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
> out_del_list:
> list_del(&rdtgrp->rdtgroup_list);
> out_rmid_free:
> + rdtgroup_unassign_cntrs(rdtgrp);
> mkdir_rdt_prepare_rmid_free(rdtgrp);
> out_closid_free:
> closid_free(closid);
> @@ -4010,6 +4064,9 @@ static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> update_closid_rmid(tmpmask, NULL);
>
> rdtgrp->flags = RDT_DELETED;
> +
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
>
> /*
> @@ -4056,6 +4113,8 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
> update_closid_rmid(tmpmask, NULL);
>
> + rdtgroup_unassign_cntrs(rdtgrp);
> +
> free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
> closid_free(rdtgrp->closid);
>
Thanks.
-Fenghua
Powered by blists - more mailing lists