linux-kernel - Re: [PATCH v4 1/7] x86/resctrl: Create separate domains for control and monitoring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc1a144f-6667-18fb-7fe7-cd15ebfedd08@intel.com>
Date:   Fri, 11 Aug 2023 10:29:25 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghua.yu@...el.com>,
        "Peter Newman" <peternewman@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        Shuah Khan <skhan@...uxfoundation.org>, <x86@...nel.org>
CC:     Shaopeng Tan <tan.shaopeng@...itsu.com>,
        James Morse <james.morse@....com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Babu Moger <babu.moger@....com>,
        Randy Dunlap <rdunlap@...radead.org>,
        <linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <patches@...ts.linux.dev>
Subject: Re: [PATCH v4 1/7] x86/resctrl: Create separate domains for control
 and monitoring

Hi Tony,

On 7/22/2023 12:07 PM, Tony Luck wrote:
> First step towards supporting resource control where the scope of
> control operations is not the same as monitor operations.

Each changelog should stand on its own merit. This changelog
appears to be written as a continuation of the cover letter.

Please do ensure that each patch first establishes the context
before it describes the problem and solution. For example,
as a context this changelog can start by describing what the
resctrl domains list represents.

> 
> Add an extra list in the rdt_resource structure. For this will
> just duplicate the existing list of domains based on the L3 cache
> scope.

The above paragraph does not make this change appealing at all.

> Refactor the domain_add_cpu() and domain_remove() functions to

domain_remove() -> domain_remove_cpu()

> build separate lists for r->alloc_capable and r->mon_capable
> resources. Note that only the "L3" domain currently supports
> both types.

"L3" domain -> "L3" resource?

> 
> Change all places where monitoring functions walk the list of
> domains to use the new "mondomains" list instead of the old
> "domains" list.

I would not refer to it as "the old domains list" as it creates
impression that this is being replaced. The changelog makes
no mention that domains list will remain and be dedicated to
control domains. I think this is important to include in description
of this change.

> 
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
>  include/linux/resctrl.h                   |  10 +-
>  arch/x86/kernel/cpu/resctrl/internal.h    |   2 +-
>  arch/x86/kernel/cpu/resctrl/core.c        | 195 +++++++++++++++-------
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |   2 +-
>  arch/x86/kernel/cpu/resctrl/monitor.c     |   2 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  30 ++--
>  6 files changed, 167 insertions(+), 74 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 8334eeacfec5..1267d56f9e76 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -151,9 +151,11 @@ struct resctrl_schema;
>   * @mon_capable:	Is monitor feature available on this machine
>   * @num_rmid:		Number of RMIDs available
>   * @cache_level:	Which cache level defines scope of this resource
> + * @mon_scope:		Scope of this resource if different from cache_level

I think this addition should be deferred. As it is here it the "if different
from cache_level" also creates many questions (when will it be different?
how will it be determined that the scope is different in order to know that
mon_scope should be used?)

Looking ahead on how mon_scope is used there does not seem to be an "if"
involved at all ... mon_scope is always the monitoring scope. 

>   * @cache:		Cache allocation related data
>   * @membw:		If the component has bandwidth controls, their properties.
>   * @domains:		All domains for this resource

A change to the domains comment would also help - to highlight that it is
now dedicated to control domains.

> + * @mondomains:		Monitor domains for this resource
>   * @name:		Name to use in "schemata" file.
>   * @data_width:		Character width of data when displaying
>   * @default_ctrl:	Specifies default cache cbm or memory B/W percent.
> @@ -169,9 +171,11 @@ struct rdt_resource {
>  	bool			mon_capable;
>  	int			num_rmid;
>  	int			cache_level;
> +	int			mon_scope;
>  	struct resctrl_cache	cache;
>  	struct resctrl_membw	membw;
>  	struct list_head	domains;
> +	struct list_head	mondomains;
>  	char			*name;
>  	int			data_width;
>  	u32			default_ctrl;

...

> @@ -384,14 +386,15 @@ void rdt_ctrl_update(void *arg)
>  }
>  
>  /*
> - * rdt_find_domain - Find a domain in a resource that matches input resource id
> + * rdt_find_domain - Find a domain in one of the lists for a resource that
> + * matches input resource id
>   *

This change makes the function more vague. I think original summary is
still accurate, how the list is used can be describe in the details below.
I see more changes to this function is upcoming and I will comment more 
at those sites.

>   * Search resource r's domain list to find the resource id. If the resource
>   * id is found in a domain, return the domain. Otherwise, if requested by
>   * caller, return the first domain whose id is bigger than the input id.
>   * The domain list is sorted by id in ascending order.
>   */
> -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
> +struct rdt_domain *rdt_find_domain(struct list_head *h, int id,
>  				   struct list_head **pos)
>  {
>  	struct rdt_domain *d;
> @@ -400,7 +403,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
>  	if (id < 0)
>  		return ERR_PTR(-ENODEV);
>  
> -	list_for_each(l, &r->domains) {
> +	list_for_each(l, h) {
>  		d = list_entry(l, struct rdt_domain, list);
>  		/* When id is found, return its domain. */
>  		if (id == d->id)
> @@ -487,6 +490,94 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
>  	return 0;
>  }
>  
> +static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
> +{
> +	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> +	struct list_head *add_pos = NULL;
> +	struct rdt_hw_domain *hw_dom;
> +	struct rdt_domain *d;
> +	int err;
> +
> +	d = rdt_find_domain(&r->domains, id, &add_pos);
> +	if (IS_ERR(d)) {
> +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> +		return;
> +	}
> +
> +	if (d) {
> +		cpumask_set_cpu(cpu, &d->cpu_mask);
> +		if (r->cache.arch_has_per_cpu_cfg)
> +			rdt_domain_reconfigure_cdp(r);
> +		return;
> +	}
> +
> +	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> +	if (!hw_dom)
> +		return;
> +
> +	d = &hw_dom->d_resctrl;
> +	d->id = id;
> +	cpumask_set_cpu(cpu, &d->cpu_mask);
> +
> +	rdt_domain_reconfigure_cdp(r);
> +
> +	if (domain_setup_ctrlval(r, d)) {
> +		domain_free(hw_dom);
> +		return;
> +	}
> +
> +	list_add_tail(&d->list, add_pos);
> +
> +	err = resctrl_online_ctrl_domain(r, d);
> +	if (err) {
> +		list_del(&d->list);
> +		domain_free(hw_dom);
> +	}
> +}
> +
> +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> +	int id = get_cpu_cacheinfo_id(cpu, r->mon_scope);

Using a different scope variable but continuing to treat it
as a cache level creates unnecessary confusion at this point.

> +	struct list_head *add_pos = NULL;
> +	struct rdt_hw_domain *hw_dom;
> +	struct rdt_domain *d;
> +	int err;
> +
> +	d = rdt_find_domain(&r->mondomains, id, &add_pos);
> +	if (IS_ERR(d)) {
> +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);

Note for future change ... this continues to refer to monitor scope as
a cache id. I did not see this changed in the later patch that actually
changes how scope is used.

> +		return;
> +	}
> +
> +	if (d) {
> +		cpumask_set_cpu(cpu, &d->cpu_mask);
> +		if (r->cache.arch_has_per_cpu_cfg)
> +			rdt_domain_reconfigure_cdp(r);

Copy & paste error?

> +		return;
> +	}
> +
> +	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> +	if (!hw_dom)
> +		return;
> +
> +	d = &hw_dom->d_resctrl;
> +	d->id = id;
> +	cpumask_set_cpu(cpu, &d->cpu_mask);
> +
> +	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> +		domain_free(hw_dom);
> +		return;
> +	}
> +
> +	list_add_tail(&d->list, add_pos);
> +
> +	err = resctrl_online_mon_domain(r, d);
> +	if (err) {
> +		list_del(&d->list);
> +		domain_free(hw_dom);
> +	}
> +}
> +
>  /*
>   * domain_add_cpu - Add a cpu to a resource's domain list.
>   *
> @@ -502,61 +593,19 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
>   */
>  static void domain_add_cpu(int cpu, struct rdt_resource *r)
>  {
> -	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
> -	struct list_head *add_pos = NULL;
> -	struct rdt_hw_domain *hw_dom;
> -	struct rdt_domain *d;
> -	int err;
> -
> -	d = rdt_find_domain(r, id, &add_pos);
> -	if (IS_ERR(d)) {
> -		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> -		return;
> -	}
> -
> -	if (d) {
> -		cpumask_set_cpu(cpu, &d->cpu_mask);
> -		if (r->cache.arch_has_per_cpu_cfg)
> -			rdt_domain_reconfigure_cdp(r);
> -		return;
> -	}
> -
> -	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
> -	if (!hw_dom)
> -		return;
> -
> -	d = &hw_dom->d_resctrl;
> -	d->id = id;
> -	cpumask_set_cpu(cpu, &d->cpu_mask);
> -
> -	rdt_domain_reconfigure_cdp(r);
> -
> -	if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
> -		domain_free(hw_dom);
> -		return;
> -	}
> -
> -	if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
> -		domain_free(hw_dom);
> -		return;
> -	}
> -
> -	list_add_tail(&d->list, add_pos);
> -
> -	err = resctrl_online_domain(r, d);
> -	if (err) {
> -		list_del(&d->list);
> -		domain_free(hw_dom);
> -	}
> +	if (r->alloc_capable)
> +		domain_add_cpu_ctrl(cpu, r);
> +	if (r->mon_capable)
> +		domain_add_cpu_mon(cpu, r);
>  }

A resource could be both alloc and mon capable ... both
domain_add_cpu_ctrl() and domain_add_cpu_mon() can fail.
Should domain_add_cpu_mon() still be run for a CPU if
domain_add_cpu_ctrl() failed? 

Looking ahead the CPU should probably also not be added
to the default groups mask if a failure occurred.

> -static void domain_remove_cpu(int cpu, struct rdt_resource *r)
> +static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
>  {
>  	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
>  	struct rdt_hw_domain *hw_dom;
>  	struct rdt_domain *d;
>  
> -	d = rdt_find_domain(r, id, NULL);
> +	d = rdt_find_domain(&r->domains, id, NULL);
>  	if (IS_ERR_OR_NULL(d)) {
>  		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
>  		return;
> @@ -565,7 +614,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  
>  	cpumask_clear_cpu(cpu, &d->cpu_mask);
>  	if (cpumask_empty(&d->cpu_mask)) {
> -		resctrl_offline_domain(r, d);
> +		resctrl_offline_ctrl_domain(r, d);
>  		list_del(&d->list);
>  
>  		/*
> @@ -578,6 +627,30 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  
>  		return;
>  	}
> +}
> +
> +static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> +	int id = get_cpu_cacheinfo_id(cpu, r->cache_level);

Introducing mon_scope can really be deferred ... here the monitoring code
is not using mon_scope anyway.

> +	struct rdt_hw_domain *hw_dom;
> +	struct rdt_domain *d;
> +
> +	d = rdt_find_domain(&r->mondomains, id, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		pr_warn("Couldn't find cache id for CPU %d\n", cpu);
> +		return;
> +	}
> +	hw_dom = resctrl_to_arch_dom(d);
> +
> +	cpumask_clear_cpu(cpu, &d->cpu_mask);
> +	if (cpumask_empty(&d->cpu_mask)) {
> +		resctrl_offline_mon_domain(r, d);
> +		list_del(&d->list);
> +
> +		domain_free(hw_dom);
> +
> +		return;
> +	}
>  
>  	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
>  		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {

Reinette