lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 13 Dec 2022 23:46:43 +0000
From:   "Moger, Babu" <Babu.Moger@....com>
To:     James Morse <james.morse@....com>
CC:     "fenghua.yu@...el.com" <fenghua.yu@...el.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "paulmck@...nel.org" <paulmck@...nel.org>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "quic_neeraju@...cinc.com" <quic_neeraju@...cinc.com>,
        "rdunlap@...radead.org" <rdunlap@...radead.org>,
        "damien.lemoal@...nsource.wdc.com" <damien.lemoal@...nsource.wdc.com>,
        "songmuchun@...edance.com" <songmuchun@...edance.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "jpoimboe@...nel.org" <jpoimboe@...nel.org>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "chang.seok.bae@...el.com" <chang.seok.bae@...el.com>,
        "pawan.kumar.gupta@...ux.intel.com" 
        <pawan.kumar.gupta@...ux.intel.com>,
        "jmattson@...gle.com" <jmattson@...gle.com>,
        "daniel.sneddon@...ux.intel.com" <daniel.sneddon@...ux.intel.com>,
        "Das1, Sandipan" <Sandipan.Das@....com>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "bagasdotme@...il.com" <bagasdotme@...il.com>,
        "eranian@...gle.com" <eranian@...gle.com>,
        "corbet@....net" <corbet@....net>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "reinette.chatre@...el.com" <reinette.chatre@...el.com>
Subject: RE: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
 mbm_total_bytes_config

[AMD Official Use Only - General]

Hi James,

> -----Original Message-----
> From: James Morse <james.morse@....com>
> Sent: Tuesday, December 13, 2022 11:55 AM
> To: Moger, Babu <Babu.Moger@....com>
> Cc: fenghua.yu@...el.com; dave.hansen@...ux.intel.com; x86@...nel.org;
> hpa@...or.com; paulmck@...nel.org; akpm@...ux-foundation.org;
> quic_neeraju@...cinc.com; rdunlap@...radead.org;
> damien.lemoal@...nsource.wdc.com; songmuchun@...edance.com;
> peterz@...radead.org; jpoimboe@...nel.org; pbonzini@...hat.com;
> chang.seok.bae@...el.com; pawan.kumar.gupta@...ux.intel.com;
> jmattson@...gle.com; daniel.sneddon@...ux.intel.com; Das1, Sandipan
> <Sandipan.Das@....com>; tony.luck@...el.com; linux-doc@...r.kernel.org;
> linux-kernel@...r.kernel.org; bagasdotme@...il.com; eranian@...gle.com;
> corbet@....net; tglx@...utronix.de; mingo@...hat.com; bp@...en8.de;
> reinette.chatre@...el.com
> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write
> mbm_total_bytes_config
> 
> Hi Babu,
> 
> On 08/12/2022 00:02, Moger, Babu wrote:
> > [AMD Official Use Only - General]
> >> -----Original Message-----
> >> From: James Morse <james.morse@....com>
> >> Sent: Wednesday, December 7, 2022 11:21 AM
> >> To: Moger, Babu <Babu.Moger@....com>
> >> Cc: fenghua.yu@...el.com; dave.hansen@...ux.intel.com;
> >> x86@...nel.org; hpa@...or.com; paulmck@...nel.org;
> >> akpm@...ux-foundation.org; quic_neeraju@...cinc.com;
> >> rdunlap@...radead.org; damien.lemoal@...nsource.wdc.com;
> >> songmuchun@...edance.com; peterz@...radead.org;
> jpoimboe@...nel.org;
> >> pbonzini@...hat.com; chang.seok.bae@...el.com;
> >> pawan.kumar.gupta@...ux.intel.com;
> >> jmattson@...gle.com; daniel.sneddon@...ux.intel.com; Das1, Sandipan
> >> <Sandipan.Das@....com>; tony.luck@...el.com;
> >> linux-doc@...r.kernel.org; linux-kernel@...r.kernel.org;
> >> bagasdotme@...il.com; eranian@...gle.com; corbet@....net;
> >> tglx@...utronix.de; mingo@...hat.com; bp@...en8.de;
> >> reinette.chatre@...el.com
> >> Subject: Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to
> >> write mbm_total_bytes_config
> 
> >> On 04/11/2022 20:01, Babu Moger wrote:
> >>> The current event configuration for mbm_total_bytes can be changed
> >>> by the user by writing to the file
> >>> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
> >>>
> >>> The event configuration settings are domain specific and will affect
> >>> all the CPUs in the domain.
> >>>
> >>> Following are the types of events supported:
> >>>
> >>> ====
> >> ===========================================================
> >>> Bits   Description
> >>> ====
> >> ===========================================================
> >>> 6      Dirty Victims from the QOS domain to all types of memory
> >>> 5      Reads to slow memory in the non-local NUMA domain
> >>> 4      Reads to slow memory in the local NUMA domain
> >>> 3      Non-temporal writes to non-local NUMA domain
> >>> 2      Non-temporal writes to local NUMA domain
> >>> 1      Reads to memory in the non-local NUMA domain
> >>> 0      Reads to memory in the local NUMA domain
> >>> ====
> >> ===========================================================
> >>>
> >>> For example:
> >>> To change the mbm_total_bytes to count only reads on domain 0, the
> >>> bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33).
> >>> Run the command.
> >>> 	$echo  0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >>>
> >>> To change the mbm_total_bytes to count all the slow memory reads on
> >>> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> >>> Run the command.
> >>> 	$echo  1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> >>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> index 18f9588a41cf..0cdccb69386e 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> >>> @@ -1505,6 +1505,133 @@ static int
> >>> mbm_local_bytes_config_show(struct
> >> kernfs_open_file *of,
> >>>  	return 0;
> >>>  }
> >>>
> >>> +static void mon_event_config_write(void *info) {
> >>> +	struct mon_config_info *mon_info = info;
> >>> +	u32 index;
> >>> +
> >>> +	index = mon_event_config_index_get(mon_info->evtid);
> >>> +	if (index >= MAX_CONFIG_EVENTS) {
> >>> +		pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> >>> +		return;
> >>> +	}
> >>> +	wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> >> }
> >>> +
> >>> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> >>> +			    u32 evtid, u32 val)
> >>> +{
> >>> +	struct mon_config_info mon_info = {0};
> >>> +	int ret = 0;
> >>> +
> >>> +	rdt_last_cmd_clear();
> >>> +
> >>> +	/* mon_config cannot be more than the supported set of events */
> >>> +	if (val > MAX_EVT_CONFIG_BITS) {
> >>> +		rdt_last_cmd_puts("Invalid event configuration\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	/*
> >>> +	 * Read the current config value first. If both are same then
> >>> +	 * we don't need to write it again.
> >>> +	 */
> >>> +	mon_info.evtid = evtid;
> >>
> >>> +	mondata_config_read(d, &mon_info);
> >>
> >> This reads the MSR on this CPU, which gets the result for this domain...
> >
> > [1] No. This read happens at the target domain.
> 
> Oops ... looks like I muddled that with mon_event_config_read().
> 
> 
> > static void mondata_config_read(struct rdt_domain *d, struct
> > mon_config_info *mon_info) {
> >         smp_call_function_any(&d->cpu_mask, mon_event_config_read,
> > mon_info, 1); }
> 
> >>> +	if (mon_info.mon_config == val)
> >>> +		goto write_exit;
> >>> +
> >>> +	mon_info.mon_config = val;
> >>> +
> >>> +	/*
> >>> +	 * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> >>> +	 * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> >>> +	 * are scoped at the domain level. Writing any of these MSRs
> >>> +	 * on one CPU is supposed to be observed by all CPUs in the
> >>> +	 * domain. However, the hardware team recommends to update
> >>> +	 * these MSRs on all the CPUs in the domain.
> >>> +	 */
> >>
> >>> +	on_each_cpu_mask(&d->cpu_mask, mon_event_config_write,
> >> &mon_info,
> >>> +1);
> >>
> >> ... but here you IPI all the CPUs in the target domain to update them.
> 
> > [2] There have been some changes in this area recently. The
> > requirement of writing the value on all the CPUs in the domain is not
> > required anymore. I am working on verifying this right now.  If
> > everything works, then I can do smp_call_function_any(&d->cpu_mask,
> > mon_event_config_write,  &mon_info, 1);
> >
> > I will confirm this soon.
> 
> Okay, that makes my next question more confusing then ....
> 
> 
> >> This means you unnecessarily IPI the CPUs in the target domain if
> >> they already had this value, but the write syscall occurred on a
> >> domain that differs. This isn't what you intended, but its benign.
> >> More of a problem is: Won't this get skipped if the write syscall
> >> occurs on a domain that happens to have the target configuration already?
> 
> > Do you still think this is a problem after my comment [1] above?  Or Am I
> missing something?
> 
> I'd muddled two similarly named functions. Sorry for the noise!

No worries. It made me look closely. 
> 
> I think what you're left with is the question "What is the monitor config for
> CPUs that were offline when it was last changed?". If its preserved by the CPU,
> then its some unknown value, and needs to be made the same as the value
> user-space/the-domain currently expects.
> 
> If there is only one config value for the domain (as your comment above
> suggests), then nothing needs doing here.

Ok. Thanks

> 
> 
> >> Because you need the same value to be written on every CPU ... what
> >> happens to CPUs that are offline when the configuration is changed?
> >> Do they keep their previous value, or does it get reset?
> >
> > The contents of this MSR register are held outside of all the cores.
> > If the value changes while a cpu is offline, and it reads it once it comes
> online, it will get the new value.
> 
> This fits with your new description of the value only needing to be written from
> one CPU in the domain.

Yes. Will change the comment about one CPU. 
Still waiting for the comment for the whole series from Reinette before I re-spin the next version.
> 
> 
> >> I think this is best solved with a percpu variable for the current
> >> value of the MSR. You can then read it for CPUs in a remote domain,
> >> and only issue IPIs to 'sync' the value if needed. You can then
> >> re-use the sync call in
> >> resctrl_online_cpu() to set the MSR to whatever value it should currently be.
> >
> > This may not be required with my comment 1 and 2 above.
> >
> >>
> >>
> >>> +
> >>> +	/*
> >>> +	 * When an Event Configuration is changed, the bandwidth counters
> >>> +	 * for all RMIDs and Events will be cleared by the hardware. The
> >>> +	 * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> >>> +	 * every RMID on the next read to any event for every RMID.
> >>> +	 * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> >>> +	 * cleared while it is tracked by the hardware. Clear the
> >>> +	 * mbm_local and mbm_total counts for all the RMIDs.
> >>> +	 */
> >>> +	memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> >>> +	memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> >>> +
> >>> +write_exit:
> >>> +	return ret;
> >>> +}
> >>
> >>
> >>> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32
> >>> +evtid) {
> >>> +	char *dom_str = NULL, *id_str;
> >>> +	unsigned long dom_id, val;
> >>> +	struct rdt_domain *d;
> >>> +	int ret = 0;
> >>> +
> >>> +next:
> >>> +	if (!tok || tok[0] == '\0')
> >>> +		return 0;
> >>> +
> >>> +	/* Start processing the strings for each domain */
> >>> +	dom_str = strim(strsep(&tok, ";"));
> >>> +	id_str = strsep(&dom_str, "=");
> >>> +
> >>> +	if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> >>> +		rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> >>> +		rdt_last_cmd_puts("Missing '=' or non-numeric event
> >> configuration value\n");
> >>> +		return -EINVAL;
> >>> +	}
> >>
> >> This is parsing the same format strings as parse_line(). Is there any
> >> chance that code could be re-used instead of duplicated? This way
> >> anything that is added to the format (or bugs found!) only need supporting in
> once place.
> >
> > I have checked on reusing the parse_line. The parse_line is
> > specifically written for schemata.  We can't reuse parse_line without
> changing it completely.
> 
> I agree its a little more complicated than it looked at first. I might have a go at
> it later...

Ok Thanks
Babu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ