linux-kernel - Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ccb907b-e8c9-4997-bc45-4a457ee84494@amd.com>
Date: Wed, 19 Feb 2025 18:34:42 -0600
From: "Moger, Babu" <bmoger@....com>
To: Dave Martin <Dave.Martin@....com>, Babu Moger <babu.moger@....com>
Cc: corbet@....net, reinette.chatre@...el.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
 tony.luck@...el.com, peternewman@...gle.com, x86@...nel.org, hpa@...or.com,
 paulmck@...nel.org, akpm@...ux-foundation.org, thuth@...hat.com,
 rostedt@...dmis.org, xiongwei.song@...driver.com,
 pawan.kumar.gupta@...ux.intel.com, daniel.sneddon@...ux.intel.com,
 jpoimboe@...nel.org, perry.yuan@....com, sandipan.das@....com,
 kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
 xin3.li@...el.com, andrew.cooper3@...rix.com, ebiggers@...gle.com,
 mario.limonciello@....com, james.morse@....com, tan.shaopeng@...itsu.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 maciej.wieczor-retman@...el.com, eranian@...gle.com
Subject: Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify
 assignment states of the groups

Hi Dave,

On 2/19/2025 10:07 AM, Dave Martin wrote:
> Hi,
> 
> On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
>> When mbm_cntr_assign mode is enabled, users can designate which of the MBM
>> events in the CTRL_MON or MON groups should have counters assigned.
>>
>> Provide an interface for assigning MBM events by writing to the file:
>> /sys/fs/resctrl/info/L3_MON/mbm_assign_control. Using this interface,
>> events can be assigned or unassigned as needed.
>>
>> Format is similar to the list format with addition of opcode for the
>> assignment operation.
>>   "<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>"
> 
> [...]
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 6e29827239e0..299839bcf23f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
> 
> [...]
> 
>> +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>> +						char *buf, size_t nbytes, loff_t off)
>> +{
>> +	struct rdt_resource *r = of->kn->parent->priv;
>> +	char *token, *cmon_grp, *mon_grp;
>> +	enum rdt_group_type rtype;
>> +	int ret;
>> +
>> +	/* Valid input requires a trailing newline */
>> +	if (nbytes == 0 || buf[nbytes - 1] != '\n')
>> +		return -EINVAL;
>> +
>> +	buf[nbytes - 1] = '\0';
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
>> +		rdt_last_cmd_puts("mbm_cntr_assign mode is not enabled\n");
>> +		mutex_unlock(&rdtgroup_mutex);
>> +		cpus_read_unlock();
>> +		return -EINVAL;
>> +	}
>> +
>> +	while ((token = strsep(&buf, "\n")) != NULL) {
>> +		/*
>> +		 * The write command follows the following format:
>> +		 * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>> +		 * Extract the CTRL_MON group.
>> +		 */
>> +		cmon_grp = strsep(&token, "/");
>> +
> 
> As when reading this file, I think that the data can grow larger than a
> page and get split into multiple write() calls.
> 
> I don't currently think the file needs to be redesigned, but there are
> some concerns about how userspace will work with it that need to be
> sorted out.
> 
> Every monitoring group can contribute a line to this file:
> 
> 	CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
> 
> so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
> 
> NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
> 600 bytes per monitoring group in the worst case.
> 
> We don't need to have many control and monitoring groups for this to
> grow potentially over 4K.
> 
> 
> We could simply place a limit on how much userspace is allowed to write
> to this file in one go, although this restriction feels difficult for
> userspace to follow -- but maybe this is workable in the short term, on
> current systems (?)
> 
> Otherwise, since we expect this interface to be written using scripting
> languages, I think we need to be prepared to accept fully-buffered
> I/O.  That means that the data may be cut at random places, not
> necessarily at newlines.  (For smaller files such as schemata this is
> not such an issue, since the whole file is likely to be small enough to
> fit into the default stdio buffers -- this is how sysfs gets away with
> it IIUC.)
> 
> For fully-buffered I/O, we may have to cache an incomplete line in
> between write() calls.  If there is a dangling incomplete line when the
> file is closed then it is hard to tell userspace, because people often
> don't bother to check the return value of close(), fclose() etc.
> However, since it's an ABI violation for userspace to end this file
> with a partial line, I think it's sufficient to report that via
> last_cmd_status.  (Making close() return -EIO still seems a good idea
> though, just in case userspace is listening.)

Seems like we can add a check in resctrl_mbm_assign_control_write() to 
compare nbytes > PAGE_SIZE.

But do we really need this? I have no way of testing this. Help me 
understand.

All these file operations go thru generic call kernfs_fop_write_iter(). 
Doesn't it take care of buffer check and overflow?


> 
> I hacked up something a bit like this so that schemata could be written
> interactively from the shell, so I can try to port that onto this series
> as an illustration, if it helps.
> 
> Cheers
> ---Dave
> 

Thanks
Babu