lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Jan 2024 14:25:09 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Haifeng Xu <haifeng.xu@...pee.com>
CC: <fenghua.yu@...el.com>, <babu.moger@....com>, <peternewman@...gle.com>,
	<x86@...nel.org>, <linux-kernel@...r.kernel.org>, James Morse
	<james.morse@....com>
Subject: Re: [PATCH 3/3] x86/resctrl: Display cache occupancy of busy RMIDs

(+James)

Hi Haifeng,

On 1/23/2024 1:20 AM, Haifeng Xu wrote:
> If llc_occupany is enabled, the RMID may not be freed immediately unless
> its llc_occupany is less than the resctrl_rmid_realloc_threshold.
> 
> In our production environment, those unused RMIDs get stuck in the limbo
> list forever because their llc_occupancy are larger than the threshold.
> After turning it up , we can successfully free unused RMIDs and create
> new monitor groups. In order to accquire the llc_occupancy of RMIDs in
> each rdt domain, we use perf tool to track and filter the log manually.
> 
> It's not efficient enough. Therefore, we can add a RFTYPE_TOP_INFO file
> 'busy_rmids_info' that tells users the llc_occupancy of busy RMIDs. It
> can also help to guide users how much the resctrl_rmid_realloc_threshold
> should be.

I am addressing both patch 2/3 and patch 3/3 here.

First, please note that resctrl is obtaining support for Arm's Memory 
System Resource Partitioning and Monitoring (MPAM) and MPAM's monitoring
is done with a monitoring group that is dependent on the control group,
not independent as Intel and AMD. Please see [1] for more details.

resctrl is the generic interface that will be used to interact with RDT
on Intel, PQoS on AMD, and also MPAM on Arm. We thus need to ensure that
the interface is appropriate for all. Specifically, for Arm there is
no global "free RMID list", on Arm the free RMIDs (PMG in Arm language,
but rmid is the term that made it into resctrl) are per control group.

Second, this addition seems to be purely a debugging aid. I thus don't see
this as something that users may want/need all the time, yet when users do
want/need it, accurate data is preferred. To that end, the limbo
code already walks the busy list once per second. What if there is a
new tracepoint within the limbo code that shares the exact data used during
limbo list management? From what I can tell, this data, combined with the
per-monitor-group "mon_hw_id", should give user space sufficient data to
debug the scenarios mentioned in these patches.

I did add James to this discussion to make him aware of your requirements.
Please do include him in future submissions.

Reinette

[1] https://lore.kernel.org/all/20231215174343.13872-1-james.morse@arm.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ