linux-kernel - Re: [PATCH 3/7] x86/intel_rdt: Add support for cache bit mask management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1505181105140.6658@vshiva-Udesk>
Date:	Mon, 18 May 2015 12:17:31 -0700 (PDT)
From:	Vikas Shivappa <vikas.shivappa@...el.com>
To:	Thomas Gleixner <tglx@...utronix.de>
cc:	Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
	vikas.shivappa@...el.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, mingo@...nel.org,
	tj@...nel.org, peterz@...radead.org,
	Matt Fleming <matt.fleming@...el.com>,
	"Auld, Will" <will.auld@...el.com>, peter.zijlstra@...el.com,
	h.peter.anvin@...el.com,
	"Juvva, Kanaka D" <kanaka.d.juvva@...el.com>, mtosatti@...hat.com
Subject: Re: [PATCH 3/7] x86/intel_rdt: Add support for cache bit mask
 management



On Fri, 15 May 2015, Thomas Gleixner wrote:

> On Mon, 11 May 2015, Vikas Shivappa wrote:
>> @@ -35,6 +36,42 @@ static struct rdt_subsys_info rdtss_info;
>>  static DEFINE_MUTEX(rdt_group_mutex);
>>  struct intel_rdt rdt_root_group;
>>
>> +/*
>> + * Mask of CPUs for writing CBM values. We only need one per-socket.
>
> One mask? One CPU? One what? Please add comments which are
> understandable and useful for people who did NOT write that code.

when there is already code in the upstream code which has comments exactly 
consistent - when that made sense , this should ?

/*
  * Mask of CPUs for reading CQM values. We only need one per-socket.
  */
static cpumask_t cqm_cpumask;

>
>> + */
>> +static cpumask_t rdt_cpumask;
>
>> +static void __cpu_cbm_update(void *info)
>> +{
>> +	unsigned int closid = *((unsigned int *)info);
>
> So all you want is to transfer closid, right? Why having a pointer
> instead of just doing
>
> 	on_each_cpu_mask(...., (void *) id);
>
> and
> 	id = (unsigned long) id;

will fix.

>
> Looks too simple and readable, right?
>
>> +/*
>> + * cbm_update_msrs() - Updates all the existing IA32_L3_MASK_n MSRs
>> + * which are one per CLOSid, except IA32_L3_MASK_0 on the current package.
>> + * @cpu : the cpu on which the mask is updated.
>> + */
>> +static inline void cbm_update_msrs(int cpu)
>> +{
>> +	int maxid = boot_cpu_data.x86_rdt_max_closid;
>> +	unsigned int i;
>> +
>> +	if (WARN_ON(cpu != smp_processor_id()))
>> +		return;
>> +
>> +	for (i = 1; i < maxid; i++) {
>
> Lacks a comment why this starts with 1 and not with 0 as one would expect.

Its in the function comment just above - "except IA32_L3_MASK_0 on the current 
package."

>
>> +		if (ccmap[i].clos_refcnt)
>> +			__cpu_cbm_update(&i);
>> +	}
>> +}
>
> This is only used by the cpu hotplug code. So why does it not live
> close to it?

ok will fix

Because its more fun to search for functions than having
> them just in place?
>
>> +/*
>> + * intel_cache_alloc_cbm_write() - Validates and writes the
>> + * cache bit mask(cbm) to the IA32_L3_MASK_n
>> + * and also store the same in the ccmap.
>> + *
>> + * CLOSids are reused for cgroups which have same bitmask.
>> + * - This helps to use the scant CLOSids optimally.
>> + * - This also implies that at context switch write
>> + * to PQR-MSR is done only when a task with a
>> + * different bitmask is scheduled in.
>> + */
>> +static int intel_cache_alloc_cbm_write(struct cgroup_subsys_state *css,
>
> Can you please drop these intel_ prefixes? They provide no value and
> just eat space.

well , i got ample comments which wanted me to me to specify intel as this 
feature is specific to intel.
the cache monitoring code has similar prefixes as well - is that pref 
specific then ? I just dont want people to come back and ask to make it clear 
that things are intel specific

>
>> +				 struct cftype *cft, u64 cbmvalue)
>> +{
>> +	u32 max_cbm = boot_cpu_data.x86_rdt_max_cbm_len;
>> +	struct intel_rdt *ir = css_rdt(css);
>> +	unsigned long cache_mask, max_mask;
>> +	unsigned long *cbm_tmp;
>> +	unsigned int closid;
>> +	ssize_t err = 0;
>> +
>> +	if (ir == &rdt_root_group)
>> +		return -EPERM;
>> +	bitmap_set(&max_mask, 0, max_cbm);
>> +
>> +	/*
>> +	 * Need global mutex as cbm write may allocate a closid.
>> +	 */
>> +	mutex_lock(&rdt_group_mutex);
>> +	bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
>> +	cbm_tmp = &ccmap[ir->clos].cache_mask;
>> +
>> +	if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
>> +		goto out;
>> +
>
> Sigh, what's the point of this bitmap shuffling? Nothing as far as I
> can tell, because you first set all bits of maxmask and then AND
> cbmvalue.

Again , this contraditcs your earlier comment where you asked me to use the bit 
manipulation APIs to make it more *readable*. So do you want more readable now 
or is that not clear now ?

https://lkml.org/lkml/2014/11/19/1145

+	/* Reset the least significant bit.*/
+	i = var & (var - 1);
+
+	if (i != (var & (var << 1)))
+		return false;

was replaced with the

 	if (bitmap_weight(&var, maxcbm) < min_bitmask_len)
 		return false;

 	first_bit = find_next_bit(&var, maxcbm, 0);
 	zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);

 	if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
 		return false;

3 lines replaced with more to make it more readable ! so does this come back 
then ?

>
> All you achieve is, that you truncate cbmvalue to max_cbm bits. So if
> cbmvalue has any of the invalid bits set, you happily ignore that.

taskset ignores the extra bits. is that a problem here ?

>
> This whole bitmap usage is just making the code obfuscated.
>
>     u64 max_mask = (1ULL << max_cbm) - 1;
>
>     if (cbm_value & ~max_mask)
>     	   return -EINVAL;
>
>     if (cbm_value == (u64)ccmap[ir->clos].cache_mask)
>     	   return 0;
>
> No bitmap_set, bitmap_and nothing. Simple and readable, right?
>
> The bitmap would only be necessary, IF max_cbm is > 64. But that's not
> the case and nothing in your code would survive ma_ cbm > 64. And as I
> said before CBM_LEN is 32bit for now and I serioulsy doubt, that it
> can ever go over 64bit simply because that would not fit into the mask
> MSRs.
>
> So can you please get rid of that bitmap nonsense including the
> dynamic allocation and  use simple and readable code constructs?
>
> Just for the record:
>
>     struct bla {
>     	    u64			cbm_mask;
>     };
>
> versus:
>
>     struct bla {
>     	    unsigned long	*cbm_mask;
>     };
>
>   plus
>
>     bla.cbm_mask = kzalloc(sizeof(u64));
>
> Is just utter waste of data and text space.
>
>> +	/*
>> +	 * At this point we are sure to change the cache_mask.Hence release the
>> +	 * reference to the current CLOSid and try to get a reference for
>> +	 * a different CLOSid.
>> +	 */
>> +	__clos_put(ir->clos);
>
> So you drop the reference, before you make sure that you can get a new one?

I was horrible , will add fix. thanks for pointing out .

>
>> +	if (cbm_search(cache_mask, &closid)) {
>> +		ir->clos = closid;
>> +		__clos_get(closid);
>> +	} else {
>> +		err = intel_rdt_alloc_closid(ir);
>> +		if (err)
>> +			goto out;
>
> What happens if intel_rdt_alloc_closid() fails? ir->clos is unchanged,
> but you have no reference anymore. The next call to this function or
> anything else which calls __clos_put() will trigger the underflow
> warning and the warning will be completely useless because the
> wreckage happened before that.
>
>> +static inline bool intel_rdt_update_cpumask(int cpu)
>> +{
>> +	int phys_id = topology_physical_package_id(cpu);
>> +	struct cpumask *mask = &rdt_cpumask;
>> +	int i;
>> +
>> +	for_each_cpu(i, mask) {
>> +		if (phys_id == topology_physical_package_id(i))
>> +			return false;
>> +	}
>
> You must be kidding.

the rapl and cqm use similar code. You want me to keep a seperate 
package mask for this code which not would be that frequent at all ?

for c-state transitions only idle_notifier gets called - so this is only when a 
new package is physically added. and we dont need anything for cache alloc for 
idle transitions. not really frequent ?

You know how many packages are possible. What's
> wrong with having a bitmap of physical packages which have a CPU
> handling it?
>
> 	id =  topology_physical_package_id(cpu);
> 	if (!__test_and_set_bit(id, &package_map)) {
> 		cpumask_set_cpu(cpu, &rdt_cpumask);
> 		cbm_update_msrs(cpu);
> 	}
>
> Four lines of code without loops and hoops. And here is a bitmap the
> proper data structure.
>
>> +static void intel_rdt_cpu_exit(unsigned int cpu)
>> +{
>> +	int phys_id = topology_physical_package_id(cpu);
>> +	int i;
>> +
>> +	mutex_lock(&rdt_group_mutex);
>> +	if (!cpumask_test_and_clear_cpu(cpu, &rdt_cpumask)) {
>> +		mutex_unlock(&rdt_group_mutex);
>> +		return;
>> +	}
>> +
>> +	for_each_online_cpu(i) {
>> +		if (i == cpu)
>> +			continue;
>> +
>> +		if (phys_id == topology_physical_package_id(i)) {
>> +			cpumask_set_cpu(i, &rdt_cpumask);
>> +			break;
>> +		}
>> +	}
>
> There is a better way than iterating over all online cpus ....
>
>>
>> +static struct cftype rdt_files[] = {
>> +	{
>> +		.name = "cache_mask",
>> +		.seq_show = intel_cache_alloc_cbm_read,
>> +		.write_u64 = intel_cache_alloc_cbm_write,
>> +		.mode = 0666,
>> +	},
>> +	{ }	/* terminate */
>> +};
>
> Where is the Documentation for that file? What's the content, format,
> error codes ....

Will add section to rdt.txt in Documentation/cgroups - also will move the 
documentation patch before all these patches..

>
> Sigh.
>
> 	tglx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/