[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7802f9e9-9a63-463d-a51e-e9ad0e60f77f@amd.com>
Date: Thu, 20 Feb 2025 15:29:12 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Dave Martin <Dave.Martin@....com>
Cc: corbet@....net, reinette.chatre@...el.com, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
tony.luck@...el.com, peternewman@...gle.com, x86@...nel.org, hpa@...or.com,
paulmck@...nel.org, akpm@...ux-foundation.org, thuth@...hat.com,
rostedt@...dmis.org, xiongwei.song@...driver.com,
pawan.kumar.gupta@...ux.intel.com, daniel.sneddon@...ux.intel.com,
jpoimboe@...nel.org, perry.yuan@....com, sandipan.das@....com,
kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
xin3.li@...el.com, andrew.cooper3@...rix.com, ebiggers@...gle.com,
mario.limonciello@....com, james.morse@....com, tan.shaopeng@...itsu.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
maciej.wieczor-retman@...el.com, eranian@...gle.com
Subject: Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list
assignment states of all the groups
Hi Dave,
On 2/20/25 09:44, Dave Martin wrote:
> Hi,
>
> On Wed, Feb 19, 2025 at 03:09:51PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/25 07:53, Dave Martin wrote:
>>> On Wed, Jan 22, 2025 at 02:20:30PM -0600, Babu Moger wrote:
>>>> Provide the interface to list the assignment states of all the resctrl
>>>> groups in mbm_cntr_assign mode.
>
> [...]
>
>>>> +static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>> + struct seq_file *s, void *v)
>>>> +{
>
> [...]
>
>>> Unlike the other resctrl files, it looks like the total size of this
>>> data will scale up with the number of existing monitoring groups
>>> and the lengths of the group names (in addition to the number of
>>> monitoring domains).
>>>
>>> So, this can easily be more than a page, overflowing internal limits
>>> in the seq_file and kernfs code.
>>>
>>> Do we need to track some state between read() calls? This can be done
>>> by overriding the kernfs .open() and .release() methods and hanging
>>> some state data (or an rdtgroup_file pointer) on of->priv.
>>>
>>> Also, if we allow the data to be read out in chunks, then we would
>>> either have to snapshot all the data in one go and stash the unread
>>> tail in the kernel, or we would need to move over to RCU-based
>>> enumeration or similar -- otherwise releasing rdtgroup_mutex in the
>>> middle of the enumeration in order to return data to userspace is going
>>> to be a problem...
>>
>> Good catch.
>>
>> I see similar buffer overflow is handled by calling seq_buf_clear()
>> (look at process_durations() or in show_user_instructions()).
>>
>> How about handling this by calling rdt_last_cmd_clear() before printing
>> each group?
>
> Does this work?
>
> Once seq_buf_has_overflowed() returns nonzero, data has been lost, no?
> So far as I can see, show_user_instructions() just gives up on printing
> the affected line, while process_durations() tries to anticipate
> overflow and prints out the accumulated text to dmesg before clearing
> the buffer.
Yea. Agree,
>
> In our case, we cannot send more data to userspace than was requested
> in the read() call, so we might have nowhere to drain the seq_buf
> contents to in order to free up space.
>
> sysfs "expects" userspace to do a big enough read() that this problem
> doesn't happen. In practice this is OK because people usually read
> through a buffered I/O layer like stdio, and in realistic
> implementations the user-side I/O buffer is large enough to hide this
> issue.
>
> But mbm_assign_control data is dynamically generated and potentially
> much bigger than a typical sysfs file.
I have no idea how to handle this case. We may have to live with this
problem. Let us know if there are any ideas.
>
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 484d6009869f..1828f59eb723 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -1026,6 +1026,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>> }
>>
>> list_for_each_entry(rdtg, &rdt_all_groups, rdtgroup_list) {
>> + rdt_last_cmd_clear();
>> seq_printf(s, "%s//", rdtg->kn->name);
>>
>> sep = false;
>> @@ -1041,6 +1042,7 @@ static int resctrl_mbm_assign_control_show(struct
>> kernfs_open_file *of,
>> seq_putc(s, '\n');
>>
>> list_for_each_entry(crg, &rdtg->mon.crdtgrp_list,
>> mon.crdtgrp_list) {
>> + rdt_last_cmd_clear();
>
> I don't see how this helps.
>
> Surely last_cmd_status has nothing to do with s?
Correct. Clearly, I misunderstood the problem.
>
> [...]
>
> Cheers
> ---Dave
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists