linux-kernel - Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <09f2f512-0428-4649-b8ef-33e5a03d5dcb@intel.com>
Date: Fri, 21 Feb 2025 12:10:44 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, "Moger, Babu" <babu.moger@....com>
CC: <corbet@....net>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<bp@...en8.de>, <dave.hansen@...ux.intel.com>, <tony.luck@...el.com>,
	<peternewman@...gle.com>, <x86@...nel.org>, <hpa@...or.com>,
	<paulmck@...nel.org>, <akpm@...ux-foundation.org>, <thuth@...hat.com>,
	<rostedt@...dmis.org>, <xiongwei.song@...driver.com>,
	<pawan.kumar.gupta@...ux.intel.com>, <daniel.sneddon@...ux.intel.com>,
	<jpoimboe@...nel.org>, <perry.yuan@....com>, <sandipan.das@....com>,
	<kai.huang@...el.com>, <xiaoyao.li@...el.com>, <seanjc@...gle.com>,
	<xin3.li@...el.com>, <andrew.cooper3@...rix.com>, <ebiggers@...gle.com>,
	<mario.limonciello@....com>, <james.morse@....com>,
	<tan.shaopeng@...itsu.com>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <maciej.wieczor-retman@...el.com>,
	<eranian@...gle.com>
Subject: Re: [PATCH v11 22/23] x86/resctrl: Introduce interface to list
 assignment states of all the groups

Hi Dave,

On 2/21/25 8:00 AM, Dave Martin wrote:
> On Thu, Feb 20, 2025 at 03:29:12PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/20/25 09:44, Dave Martin wrote:
>>> Hi,
>>>
>>> On Wed, Feb 19, 2025 at 03:09:51PM -0600, Moger, Babu wrote:
> 
> [...]
> 
>>>> Good catch.
>>>>
>>>> I see similar buffer overflow is handled by calling seq_buf_clear()
>>>> (look at process_durations() or in show_user_instructions()).
>>>>
>>>> How about handling this by calling rdt_last_cmd_clear() before printing
>>>> each group?
>>>
>>> Does this work?
>>>
>>> Once seq_buf_has_overflowed() returns nonzero, data has been lost, no?
>>> So far as I can see, show_user_instructions() just gives up on printing
>>> the affected line, while process_durations() tries to anticipate
>>> overflow and prints out the accumulated text to dmesg before clearing
>>> the buffer.
>>
>> Yea. Agree,
>>
>>>
>>> In our case, we cannot send more data to userspace than was requested
>>> in the read() call, so we might have nowhere to drain the seq_buf
>>> contents to in order to free up space.
>>>
>>> sysfs "expects" userspace to do a big enough read() that this problem
>>> doesn't happen.  In practice this is OK because people usually read
>>> through a buffered I/O layer like stdio, and in realistic
>>> implementations the user-side I/O buffer is large enough to hide this
>>> issue.
>>>
>>> But mbm_assign_control data is dynamically generated and potentially
>>> much bigger than a typical sysfs file.
>>
>> I have no idea how to handle this case. We may have to live with this
>> problem. Let us know if there are any ideas.
> 
> I think the current implication is that this will work for now provided
> that the generated text fits in a page.
> 
> 
> Reinette, what's your view on accepting this limitation in the interest
> of stabilising this series, and tidying up this corner case later?
> 
> As for writes to this file, we're unlikely to hit the limit unless
> there are a lot of RMIDs available and many groups with excessively
> long names.

I am more concerned about reads to this file. If only 4K writes are
supported then user space can reconfigure the system in page sized
portions. It may not be efficient if the user wants to reconfigure the
entire system but it will work. The problem with reads is that if
larger than 4K reads are required but not supported then it will be
impossible for user space to learn state of the system.

We may already be at that limit. Peter described [1] how AMD systems
already have 32 L3 monitoring domains and 256 RMIDs. With conservative
resource group names of 10 characters I see one line per monitoring group
that could look like below and thus easily be above 200 characters:

resgroupAA/mongroupAA/0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;6=tl;7=tl;8=tl;9=tl;10=tl;11=tl;12=tl;13=tl;14=tl;15=tl;16=tl;17=tl;18=tl;19=tl;20=tl;21=tl;22=tl;23=tl;24=tl;25=tl;26=tl;27=tl;28=tl;29=tl;30=tl;31=tl;32=tl

Multiplying that with the existing possible 256 monitor groups the limit
is exceeded today.

I understand that all domains having "tl" flags are not possible today, but
even if that is changed to "_" the resulting display still looks to
easily exceed a page when many RMIDs are in use.

> 
> This looks perfectly fixable, but it might be better to settle the
> design of this series first before we worry too much about it.

I think it fair to delay support of writing more than a page of
data but it looks to me like we need a solution to support displaying
more than a page of data to user space.

Reinette

[1] https://lore.kernel.org/lkml/20241106154306.2721688-2-peternewman@google.com/