lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eca1657e-3dbf-d46c-fecf-f8888a25a878@intel.com>
Date:   Wed, 24 Aug 2022 14:16:32 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Babu Moger <babu.moger@....com>, <fenghua.yu@...el.com>,
        <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>
CC:     <eranian@...gle.com>, <dave.hansen@...ux.intel.com>,
        <x86@...nel.org>, <hpa@...or.com>, <corbet@....net>,
        <linux-kernel@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <bagasdotme@...il.com>
Subject: Re: [PATCH v3 09/10] x86/resctrl: Add sysfs interface to write the
 event configuration

Hi Babu,

On 8/22/2022 6:44 AM, Babu Moger wrote:
> Add the sysfs interface to write the event configuration for the
> MBM configurable events. The event configuration can be changed by
> writing to the sysfs file for that specific event.
> 
> Following are the types of events supported.
> ==================================================================
> Bits    Description
> 6       Dirty Victims from the QOS domain to all types of memory
> 5       Reads to slow memory in the non-local NUMA domain
> 4       Reads to slow memory in the local NUMA domain
> 3       Non-temporal writes to non-local NUMA domain
> 2       Non-temporal writes to local NUMA domain
> 1       Reads to memory in the non-local NUMA domain
> 0       Reads to memory in the local NUMA domain
> 
> By default the mbm_total_bytes configuration is set to 0x7f to count
> all the types of events and mbm_local_bytes configuration is set to
> 0x15 to count all the local memory events.
> 
> For example:
> To change the mbm_total_bytes to count all the reads, run the command.
> $echo  0x33 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_config
> 
> To change the mbm_local_bytes to count all the slow memory reads, run
> the command.
> $echo  0x30 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_config
> 
> Signed-off-by: Babu Moger <babu.moger@....com>
> Reviewed-by: Ingo Molnar <mingo@...nel.org>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |  109 ++++++++++++++++++++++++++++++++
>  1 file changed, 109 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e1847d49fa15..83c8780726ff 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -323,9 +323,118 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>  	return ret;
>  }
>  
> +/*
> + * This is called via IPI to read the CQM/MBM counters
> + * in a domain.
> + */
> +void mon_event_config_write(void *info)
> +{
> +	union mon_data_bits *md = info;
> +	u32 evtid = md->u.evtid;
> +	u32 msr_index;
> +
> +	switch (evtid) {
> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
> +		msr_index = 0;
> +		break;
> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
> +		msr_index = 1;
> +		break;
> +	default:
> +		return; /* Not expected to come here */

Please no inline comments.

> +	}
> +
> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, md->u.mon_config, 0);
> +}
> +
> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
> +				       char *buf, size_t nbytes, loff_t off)
> +{
> +	struct rdt_hw_resource *hw_res;
> +	struct rdtgroup *rdtgrp;
> +	struct rdt_resource *r;
> +	unsigned int mon_config;
> +	cpumask_var_t cpu_mask;
> +	union mon_data_bits md;
> +	struct rdt_domain *d;
> +	u32 resid, domid;
> +	int ret = 0, cpu;
> +
> +	ret = kstrtouint(buf, 0, &mon_config);
> +	if (ret)
> +		return ret;
> +
> +	rdt_last_cmd_clear();
> +
> +	/* mon_config cannot be more than the supported set of events */
> +	if (mon_config > GENMASK(6, 0)) {

Could this "GENMASK()" be given a name and be moved closer to where
the bits are defined in internal.h? This will be easier to read and if any
new bits are added it would hopefully be noticed more easily to get
an update also.

> +		rdt_last_cmd_puts("Invalid event configuration\n");
> +		return -EINVAL;
> +	}
> +
> +	cpus_read_lock();
> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
> +	if (!rdtgrp) {
> +		return -ENOENT;
> +		goto e_unlock;
> +	}
> +
> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
> +		ret = -ENOMEM;
> +		goto e_unlock;
> +	}
> +
> +
> +	md.priv = of->kn->priv;
> +	resid = md.u.rid;
> +	domid = md.u.domid;
> +
> +	hw_res = &rdt_resources_all[resid];
> +	r = &hw_res->r_resctrl;
> +	d = rdt_find_domain(r, domid, NULL);
> +	if (IS_ERR_OR_NULL(d)) {
> +		ret = -ENOENT;
> +		goto e_cpumask;
> +	}
> +
> +	md.u.mon_config = mon_config & 0xFF;

Same question as previous patch. I do not see this internal
value used in the code, is storing it necessary?

> +
> +	/* Pick all the CPUs in the domain instance */
> +	for_each_cpu(cpu, &d->cpu_mask)
> +		cpumask_set_cpu(cpu, cpu_mask);
> +
> +	cpu = get_cpu();
> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on this cpu if it's in cpu_mask */

Please always use caps for CPU.

> +	if (cpumask_test_cpu(cpu, cpu_mask))
> +		mon_event_config_write(&md);
> +
> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all other cpus in cpu_mask */
> +	smp_call_function_many(cpu_mask, mon_event_config_write, &md, 1);
> +	put_cpu();

I do not think we need to propagate this pattern more in the resctrl code.
How about on_each_cpu_mask()? 

> +
> +	/*
> +	 * When an Event Configuration is changed, the bandwidth counters
> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
> +	 * RMID will be set on the next read to any BwEvent for every RMID.
> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
> +	 */
> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);

Could you please check if this is sufficient? Note how "mon_event_read()"
is called with "first = true" right after the mon sysfs files are created to
clear the state _and_ initialize the state to set the "prev" MSRs correctly.


> +
> +e_cpumask:
> +	free_cpumask_var(cpu_mask);
> +
> +e_unlock:
> +	rdtgroup_kn_unlock(of->kn);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  static const struct kernfs_ops kf_mondata_config_ops = {
>  	.atomic_write_len       = PAGE_SIZE,
>  	.seq_show               = rdtgroup_mondata_config_show,
> +	.write                  = rdtgroup_mondata_config_write,
>  };
>  
>  static bool is_cpu_list(struct kernfs_open_file *of)
> 
> 

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ