linux-kernel - Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7802f9e9-9a63-463d-a51e-e9ad0e60f77f@amd.com>
Date: Thu, 20 Feb 2025 15:29:12 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Dave Martin <Dave.Martin@....com>
Cc: corbet@....net, reinette.chatre@...el.com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
 tony.luck@...el.com, peternewman@...gle.com, x86@...nel.org, hpa@...or.com,
 paulmck@...nel.org, akpm@...ux-foundation.org, thuth@...hat.com,
 rostedt@...dmis.org, xiongwei.song@...driver.com,
 pawan.kumar.gupta@...ux.intel.com, daniel.sneddon@...ux.intel.com,
 jpoimboe@...nel.org, perry.yuan@....com, sandipan.das@....com,
 kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
 xin3.li@...el.com, andrew.cooper3@...rix.com, ebiggers@...gle.com,
 mario.limonciello@....com, james.morse@....com, tan.shaopeng@...itsu.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 maciej.wieczor-retman@...el.com, eranian@...gle.com
Subject: Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list
 assignment states of all the groups

Hi Dave,

On 2/20/25 09:44, Dave Martin wrote:
> Hi,
> 
> On Wed, Feb 19, 2025 at 03:09:51PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/25 07:53, Dave Martin wrote:
>>> On Wed, Jan 22, 2025 at 02:20:30PM -0600, Babu Moger wrote:
>>>> Provide the interface to list the assignment states of all the resctrl
>>>> groups in mbm_cntr_assign mode.
> 
> [...]
> 
>>>> +static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>> +					   struct seq_file *s, void *v)
>>>> +{
> 
> [...]
> 
>>> Unlike the other resctrl files, it looks like the total size of this
>>> data will scale up with the number of existing monitoring groups
>>> and the lengths of the group names (in addition to the number of
>>> monitoring domains).
>>>
>>> So, this can easily be more than a page, overflowing internal limits
>>> in the seq_file and kernfs code.
>>>
>>> Do we need to track some state between read() calls?  This can be done
>>> by overriding the kernfs .open() and .release() methods and hanging
>>> some state data (or an rdtgroup_file pointer) on of->priv.
>>>
>>> Also, if we allow the data to be read out in chunks, then we would
>>> either have to snapshot all the data in one go and stash the unread
>>> tail in the kernel, or we would need to move over to RCU-based
>>> enumeration or similar -- otherwise releasing rdtgroup_mutex in the
>>> middle of the enumeration in order to return data to userspace is going
>>> to be a problem...
>>
>> Good catch.
>>
>> I see similar buffer overflow is handled by calling seq_buf_clear()
>> (look at process_durations() or in show_user_instructions()).
>>
>> How about handling this by calling rdt_last_cmd_clear() before printing
>> each group?
> 
> Does this work?
> 
> Once seq_buf_has_overflowed() returns nonzero, data has been lost, no?
> So far as I can see, show_user_instructions() just gives up on printing
> the affected line, while process_durations() tries to anticipate
> overflow and prints out the accumulated text to dmesg before clearing
> the buffer.

Yea. Agree,

> 
> In our case, we cannot send more data to userspace than was requested
> in the read() call, so we might have nowhere to drain the seq_buf
> contents to in order to free up space.
> 
> sysfs "expects" userspace to do a big enough read() that this problem
> doesn't happen.  In practice this is OK because people usually read
> through a buffered I/O layer like stdio, and in realistic
> implementations the user-side I/O buffer is large enough to hide this
> issue.
> 
> But mbm_assign_control data is dynamically generated and potentially
> much bigger than a typical sysfs file.

I have no idea how to handle this case. We may have to live with this
problem. Let us know if there are any ideas.

> 
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 484d6009869f..1828f59eb723 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1026,6 +1026,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>>         }
>>
>>         list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> +               rdt_last_cmd_clear();
>>                 seq_printf(s, "%s//", rdtg->kn->name);
>>
>>                 sep = false;
>> @@ -1041,6 +1042,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>>                 seq_putc(s, '\n');
>>
>>                 list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> mon.crdtgrp_list) {
>> +                       rdt_last_cmd_clear();
> 
> I don't see how this helps.
> 
> Surely last_cmd_status has nothing to do with s?

Correct. Clearly, I misunderstood the problem.

> 
> [...]
> 
> Cheers
> ---Dave
> 

-- 
Thanks
Babu Moger