linux-kernel - Re: [PATCH V15 11/11] x86,cgroup/intel_rdt : Add a cgroup interface to manage Intel cache allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151118221511.GA24721@amt.cnet>
Date:	Wed, 18 Nov 2015 20:15:12 -0200
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Fenghua Yu <fenghua.yu@...el.com>
Cc:	H Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	x86 <x86@...nel.org>,
	Vikas Shivappa <vikas.shivappa@...ux.intel.com>
Subject: Re: [PATCH V15 11/11] x86,cgroup/intel_rdt : Add a cgroup interface
 to manage Intel cache allocation

On Thu, Oct 01, 2015 at 11:09:45PM -0700, Fenghua Yu wrote:
> Add a new cgroup 'intel_rdt' to manage cache allocation. Each cgroup
> directory is associated with a class of service id(closid). To map a
> task with closid during scheduling, this patch removes the closid field
> from task_struct and uses the already existing 'cgroups' field in
> task_struct.
> 
> The cgroup has a file 'l3_cbm' which represents the L3 cache capacity
> bitmask(CBM). The CBM is global for the whole system currently. The
> capacity bitmask needs to have only contiguous bits set and number of
> bits that can be set is less than the max bits that can be set. The
> tasks belonging to a cgroup get to fill in the L3 cache represented by
> the capacity bitmask of the cgroup. For ex: if the max bits in the CBM
> is 10 and the cache size is 10MB, each bit represents 1MB of cache
> capacity.
> 
> Root cgroup always has all the bits set in the l3_cbm. User can create
> more cgroups with mkdir syscall. By default the child cgroups inherit
> the capacity bitmask(CBM) from parent. User can change the CBM specified
> in hex for each cgroup. Each unique bitmask is associated with a class
> of service ID and an -ENOSPC is returned once we run out of
> closids.
> 
> Signed-off-by: Vikas Shivappa <vikas.shivappa@...ux.intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> ---
>  arch/x86/include/asm/intel_rdt.h |  37 +++++++-
>  arch/x86/kernel/cpu/intel_rdt.c  | 194 +++++++++++++++++++++++++++++++++++++--
>  include/linux/cgroup_subsys.h    |   4 +
>  include/linux/sched.h            |   3 -
>  init/Kconfig                     |   4 +-
>  5 files changed, 229 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
> index afb6da3..fbe1e00 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -3,6 +3,7 @@
>  
>  #ifdef CONFIG_INTEL_RDT
>  
> +#include <linux/cgroup.h>
>  #include <linux/jump_label.h>
>  
>  #define MAX_CBM_LENGTH			32
> @@ -12,20 +13,54 @@
>  extern struct static_key rdt_enable_key;
>  void __intel_rdt_sched_in(void *dummy);
>  
> +struct intel_rdt {
> +	struct cgroup_subsys_state css;
> +	u32 closid;
> +};
> +
>  struct clos_cbm_table {
>  	unsigned long l3_cbm;
>  	unsigned int clos_refcnt;
>  };
>  
>  /*
> + * Return rdt group corresponding to this container.
> + */
> +static inline struct intel_rdt *css_rdt(struct cgroup_subsys_state *css)
> +{
> +	return css ? container_of(css, struct intel_rdt, css) : NULL;
> +}
> +
> +static inline struct intel_rdt *parent_rdt(struct intel_rdt *ir)
> +{
> +	return css_rdt(ir->css.parent);
> +}
> +
> +/*
> + * Return rdt group to which this task belongs.
> + */
> +static inline struct intel_rdt *task_rdt(struct task_struct *task)
> +{
> +	return css_rdt(task_css(task, intel_rdt_cgrp_id));
> +}
> +
> +/*
>   * intel_rdt_sched_in() - Writes the task's CLOSid to IA32_PQR_MSR
>   *
>   * Following considerations are made so that this has minimal impact
>   * on scheduler hot path:
>   * - This will stay as no-op unless we are running on an Intel SKU
>   * which supports L3 cache allocation.
> + * - When support is present and enabled, does not do any
> + * IA32_PQR_MSR writes until the user starts really using the feature
> + * ie creates a rdt cgroup directory and assigns a cache_mask thats
> + * different from the root cgroup's cache_mask.
>   * - Caches the per cpu CLOSid values and does the MSR write only
> - * when a task with a different CLOSid is scheduled in.

Why is this even allowed? 

	socket CBM bits:

 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
[ | | | | | | | | | |  |  |  |  |  ]

 x x x x x x x 				    
			 x  x  x  x

 x x x x x

cgroupA.bits = [ 0 - 6 ] cgroupB.bits = [ 10 - 14]  (level 1)
cgroupA-A.bits = [ 0 - 4 ]			    (level 2)

Two ways to create a cgroup with bits [ 0 - 4] set:

1) Create a cgroup C in level 1 with a different name.
Useful to have same cgroup with two different names.

2) Create a cgroup A-B under cgroup-A with bits [0 - 4].

It just creates confusion, having two or more cgroups under 
different levels of the hierarchy with the same bits set.
(can't see any organizational benefit).

Why not return -EINVAL ? Ah, cgroups are hierarchical, right.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/