linux-kernel - Re: [PATCH v3 3/3] x86/resctrl: Implement rename op for mon groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b14598a8-2ab1-eda2-e1fb-a5f568e869d5@intel.com>
Date:   Fri, 10 Feb 2023 15:21:30 -0800
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Peter Newman <peternewman@...gle.com>, <fenghua.yu@...el.com>
CC:     <Babu.Moger@....com>, <bp@...en8.de>,
        <dave.hansen@...ux.intel.com>, <eranian@...gle.com>,
        <gupasani@...gle.com>, <hpa@...or.com>, <james.morse@....com>,
        <linux-kernel@...r.kernel.org>, <mingo@...hat.com>,
        <skodak@...gle.com>, <tony.luck@...el.com>, <tglx@...utronix.de>,
        <x86@...nel.org>
Subject: Re: [PATCH v3 3/3] x86/resctrl: Implement rename op for mon groups

Hi Peter,

On 1/25/2023 2:13 AM, Peter Newman wrote:
> To change the class of service for a large group of tasks, such as an
> application container, a container manager must write all of the tasks'
> IDs into the tasks file interface of the new control group.
> 
> If a container manager is tracking containers' bandwidth usage by
> placing tasks from each into their own monitoring group, it must first
> move the tasks to the default monitoring group of the new control group
> before it can move the tasks into their new monitoring groups. This is
> undesirable because it makes bandwidth usage during the move
> unattributable to the correct tasks and resets monitoring event counters
> and cache usage information for the group.
> 
> To address this, implement the rename operation for resctrlfs mon groups
> to effect a change in CLOSID for a MON group while otherwise leaving the
> monitoring group intact.
> 
> Signed-off-by: Peter Newman <peternewman@...gle.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 ++++++++++++++++++++++++++
>  1 file changed, 75 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b2081bc1bbfd..595f83a517c6 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
>  	return ret;
>  }
>  
> +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp,
> +			cpumask_var_t cpus)

Could you please add some function comments on what is moved and how it is accomplished?

> +{
> +	struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> +	struct task_struct *p, *t;
> +
> +	WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list));
> +	list_del(&rdtgrp->mon.crdtgrp_list);
> +
> +	list_add_tail(&rdtgrp->mon.crdtgrp_list,
> +		      &new_prdtgrp->mon.crdtgrp_list);
> +	rdtgrp->mon.parent = new_prdtgrp;
> +
> +	read_lock(&tasklist_lock);
> +	for_each_process_thread(p, t) {
> +		if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp))
> +			rdt_move_one_task(t, new_prdtgrp->closid, t->rmid,
> +					  cpus);
> +	}
> +	read_unlock(&tasklist_lock);

Can rdt_move_group_tasks() be used here?

> +
> +	update_closid_rmid(cpus, NULL);
> +}

I see the tasks in a monitor group handled. There is another way to create/manage
a monitor group. For example, by not writing tasks to the tasks file but instead
to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored
by that group. One rule is that a MON group may only have CPUs that are owned by
the CTRL_MON group.
It is not clear to me how such a group is handled in this work. 


> +
> +static int rdtgroup_rename(struct kernfs_node *kn,
> +			   struct kernfs_node *new_parent, const char *new_name)
> +{
> +	struct rdtgroup *new_prdtgrp;
> +	struct rdtgroup *rdtgrp;
> +	cpumask_var_t tmpmask;
> +	int ret;
> +
> +	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	rdtgrp = kernfs_to_rdtgroup(kn);
> +	new_prdtgrp = kernfs_to_rdtgroup(new_parent);
> +	if (!rdtgrp || !new_prdtgrp) {
> +		free_cpumask_var(tmpmask);
> +		return -EPERM;
> +	}
> +

How robust is this against user space attempting to move files? 

> +	/* Release both kernfs active_refs before obtaining rdtgroup mutex. */
> +	rdtgroup_kn_get(rdtgrp, kn);
> +	rdtgroup_kn_get(new_prdtgrp, new_parent);
> +
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) {
> +		ret = -ESRCH;
> +		goto out;
> +	}
> +
> +	/* Only a mon group can be moved to a new mon_groups directory. */
> +	if (rdtgrp->type != RDTMON_GROUP ||
> +	    !is_mon_groups(new_parent, kn->name)) {
> +		ret = -EPERM;
> +		goto out;
> +	}
> +

Should in-place moves be allowed?

> +	ret = kernfs_rename(kn, new_parent, new_name);
> +	if (ret)
> +		goto out;
> +
> +	mongrp_move(rdtgrp, new_prdtgrp, tmpmask);
> +

Can tmpmask allocation/free be done in mongrp_move()?

> +out:
> +	mutex_unlock(&rdtgroup_mutex);
> +	rdtgroup_kn_put(rdtgrp, kn);
> +	rdtgroup_kn_put(new_prdtgrp, new_parent);
> +	free_cpumask_var(tmpmask);
> +	return ret;
> +}
> +
>  static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
>  {
>  	if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3))
> @@ -3255,6 +3329,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf)
>  static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = {
>  	.mkdir		= rdtgroup_mkdir,
>  	.rmdir		= rdtgroup_rmdir,
> +	.rename		= rdtgroup_rename,
>  	.show_options	= rdtgroup_show_options,
>  };
>  

Reinette