linux-kernel - Re: [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write the event configuration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3511f4f6-d043-9a22-7779-af2c2983b6a2@amd.com>
Date:   Mon, 19 Sep 2022 11:50:15 -0500
From:   "Moger, Babu" <babu.moger@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de
Cc:     fenghua.yu@...el.com, dave.hansen@...ux.intel.com, x86@...nel.org,
        hpa@...or.com, paulmck@...nel.org, akpm@...ux-foundation.org,
        quic_neeraju@...cinc.com, rdunlap@...radead.org,
        damien.lemoal@...nsource.wdc.com, songmuchun@...edance.com,
        peterz@...radead.org, jpoimboe@...nel.org, pbonzini@...hat.com,
        chang.seok.bae@...el.com, pawan.kumar.gupta@...ux.intel.com,
        jmattson@...gle.com, daniel.sneddon@...ux.intel.com,
        sandipan.das@....com, tony.luck@...el.com, james.morse@....com,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        bagasdotme@...il.com, eranian@...gle.com
Subject: Re: [PATCH v4 11/13] x86/resctrl: Add sysfs interface to write the
 event configuration


On 9/16/22 11:17, Reinette Chatre wrote:
> Hi Babu,
>
> On 9/7/2022 11:01 AM, Babu Moger wrote:
>
> ...
>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6f067c1ac7c1..59b484eb1267 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -330,9 +330,121 @@ int rdtgroup_mondata_config_show(struct seq_file *m, void *arg)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * This is called via IPI to read the CQM/MBM counters
>> + * in a domain.
> copy&paste from previous patch?
Will correct it.
>
>> + */
>> +void mon_event_config_write(void *info)
>> +{
>> +	struct mon_config_info *mon_info = info;
>> +	u32 msr_index;
>> +
>> +	switch (mon_info->evtid) {
>> +	case QOS_L3_MBM_TOTAL_EVENT_ID:
>> +		msr_index = 0;
>> +		break;
>> +	case QOS_L3_MBM_LOCAL_EVENT_ID:
>> +		msr_index = 1;
>> +		break;
>> +	default:
>> +		/* Not expected to come here */
>> +		return;
>> +	}
>> +
>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + msr_index, mon_info->mon_config, 0);
>> +}
>> +
>> +ssize_t  rdtgroup_mondata_config_write(struct kernfs_open_file *of,
>> +				       char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct mon_config_info mon_info;
>> +	struct rdt_hw_resource *hw_res;
>> +	struct rdtgroup *rdtgrp;
>> +	struct rdt_resource *r;
>> +	unsigned int mon_config;
>> +	cpumask_var_t cpu_mask;
>> +	union mon_data_bits md;
>> +	struct rdt_domain *d;
>> +	u32 resid, domid;
>> +	int ret = 0, cpu;
>> +
>> +	ret = kstrtouint(buf, 0, &mon_config);
>> +	if (ret)
>> +		return ret;
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	/* mon_config cannot be more than the supported set of events */
>> +	if (mon_config > MAX_EVT_CONFIG_BITS) {
>> +		rdt_last_cmd_puts("Invalid event configuration\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	cpus_read_lock();
>> +	rdtgrp = rdtgroup_kn_lock_live(of->kn);
>> +	if (!rdtgrp) {
>> +		return -ENOENT;
>> +		goto e_unlock;
>> +	}
>> +
>> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
>> +		ret = -ENOMEM;
>> +		goto e_unlock;
>> +	}
>> +
>> +	md.priv = of->kn->priv;
>> +	resid = md.u.rid;
>> +	domid = md.u.domid;
>> +
>> +	hw_res = &rdt_resources_all[resid];
>> +	r = &hw_res->r_resctrl;
>> +	d = rdt_find_domain(r, domid, NULL);
>> +	if (IS_ERR_OR_NULL(d)) {
>> +		ret = -ENOENT;
>> +		goto e_cpumask;
>> +	}
>> +
>> +	/*
>> +	 * Read the current config value first. If both are same
>> +	 * then we dont need to write again
>> +	 */
>> +	mon_info.evtid = md.u.evtid;
>> +	mondata_config_read(d, &mon_info);
>> +	if (mon_info.mon_config == mon_config)
>> +		goto e_cpumask;
>> +
>> +	mon_info.mon_config = mon_config;
>> +
>> +	/* Pick all the CPUs in the domain instance */
>> +	for_each_cpu(cpu, &d->cpu_mask)
>> +		cpumask_set_cpu(cpu, cpu_mask);
>> +
>> +	/* Update MSR_IA32_EVT_CFG_BASE MSR on all the CPUs in cpu_mask */
>> +	on_each_cpu_mask(cpu_mask, mon_event_config_write, &mon_info, 1);
> If this is required then could you please add a comment why every CPU in
> the domain needs to be updated? Until now configuration changes
> only needed to be made on one CPU per domain. 
> Even in the previous patch when the user reads the current configuration
> value it is only done on one CPU in the domain ... to me that implies
> that the scope is per domain and only one CPU in the domain needs to be
> changed.

Yes, This register is domain specific. However the hardware team
recommends it to update on all the CPU threads. It is not clear in the
specs right now.

I have asked them to make that clear in the specs next time.

>
>> +
>> +	/*
>> +	 * When an Event Configuration is changed, the bandwidth counters
>> +	 * for all RMIDs and Events will be cleared, and the U-bit for every
>> +	 * RMID will be set on the next read to any BwEvent for every RMID.
>> +	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
>> +	 */
> This is a snippet that was copied from the hardware spec and since it is
> inserted into the kernel driver the context makes it hard to understand.
> Could it be translated into what it means in this context?
> Perhaps something like:
>
> 	/*
> 	 * When an Event Configuration is changed, the bandwidth counters
> 	 * for all RMIDs and Events will be cleared by the hardware. The
>          * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for every
> 	 * RMID on the next read to any event for every RMID. Subsequent
> 	 * reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) cleared
> 	 * while it is tracked by the hardware.
> 	 * Clear the mbm_local and mbm_total counts for all the RMIDs.
> 	 */
>
> Please fixup where I got it wrong.

Looks good.

Thanks Babu

>
>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
>> +
> Reinette

-- 
Thanks
Babu Moger