lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4da41bfd-6949-4f85-bd1c-d1bd0acb9686@shopee.com>
Date: Tue, 19 Mar 2024 14:38:35 +0800
From: Haifeng Xu <haifeng.xu@...pee.com>
To: Reinette Chatre <reinette.chatre@...el.com>, james.morse@....com
Cc: fenghua.yu@...el.com, babu.moger@....com, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
 peternewman@...gle.com, x86@...nel.org, linux-kernel@...r.kernel.org,
 corbet@....net, linux-doc@...r.kernel.org
Subject: Re: [PATCH v5 2/2] x86/resctrl: Add tracepoint for llc_occupancy
 tracking



On 2024/3/14 06:47, Reinette Chatre wrote:
> Hi Haifeng,
> 
> On 3/7/2024 11:41 PM, Haifeng Xu wrote:
>> In our production environment, after removing monitor groups, those unused
>> RMIDs get stuck in the limbo list forever because their llc_occupancy are
>> always larger than the threshold. But the unused RMIDs can be successfully
>> freed by turning up the threshold.
>>
>> In order to know how much the threshold should be, perf can be used to
>> acquire the llc_occupancy of RMIDs in each rdt domain.
>>
>> Instead of using perf tool to track llc_occupancy and filter the log
>> manually, it is more convenient for users to use tracepoint to do this
>> work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
>> when scanning the limbo list.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@...pee.com>
>> Suggested-by: Reinette Chatre <reinette.chatre@...el.com>
>> Suggested-by: James Morse <james.morse@....com>
>> Reviewed-by: James Morse <james.morse@....com>
>> ---
>>  Documentation/arch/x86/resctrl.rst    |  8 ++++++++
>>  arch/x86/kernel/cpu/resctrl/monitor.c |  9 +++++++++
>>  arch/x86/kernel/cpu/resctrl/trace.h   | 16 ++++++++++++++++
>>  3 files changed, 33 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index a6279df64a9d..dd3507dc765c 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
>>  each bit represents 5% of the capacity of the cache. You could partition
>>  the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>>  
>> +Tracepoint - mon_llc_occupancy_limbo
>> +------------------------------------
> 
> I think that the below paragraph would fit nicely as a new paragraph in the
> existing "max_threshold_occupancy - generic concepts" section. To support that
> just one change to text below ...
> 
>> +This tracepoint gives you the precise occupancy values for a subset of RMID
> 
> The mon_llc_occupancy_limbo tracepoint gives the precise occupancy in bytes
> for a subset of RMID ...
> 

OK, I'll move the descriptions to "max_threshold_occupancy - generic concepts" section.

>> +that are not immediately available for allocation. This can't be relied on
>> +to produce output every second, it may be necessary to attempt to create an
>> +empty monitor group to force an update. Output may only be produced if creation
>> +of a control or monitor group fails.
>> +
>>  Memory bandwidth Allocation and monitoring
>>  ==========================================
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index c34a35ec0f03..60b6a29a9e29 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -24,6 +24,7 @@
>>  #include <asm/resctrl.h>
>>  
>>  #include "internal.h"
>> +#include "trace.h"
>>  
>>  /**
>>   * struct rmid_entry - dirty tracking for all RMID.
>> @@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>>  			rmid_dirty = true;
>>  		} else {
>>  			rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
>> +
>> +			/* x86's CLOSID and RMID are independent numbers, so the entry's
>> +			 * closid is a invalid CLOSID. But on arm64, the RMID value isn't
>> +			 * a unique number for each CLOSID. It's necessary to track both
>> +			 * CLOSID and RMID because there may be dependencies between each
>> +			 * other on some architectures.
>> +			 */
> 
> Please watch for proper formatting of multi-line comment and consistent capitalization.
> I also think comment can be more accurate, for example:
> 
> 	/*
> 	 * x86's CLOSID and RMID are independent numbers, so the entry's
>  	 * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
> 	 * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't used
> 	 * to select the configuration. It is thus necessary to track both
> 	 * CLOSID and RMID because there may be dependencies between them
> 	 * on some architectures.
> 	 */
> 

Thanks for your suggestions!

>> +			trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
>>  		}
>>  
>>  		if (force_free || !rmid_dirty) {
>> diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
>> index ed5c66b8ab0b..b310b4985b94 100644
>> --- a/arch/x86/kernel/cpu/resctrl/trace.h
>> +++ b/arch/x86/kernel/cpu/resctrl/trace.h
>> @@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
>>  	    TP_printk("hits=%llu miss=%llu",
>>  		      __entry->l3_hits, __entry->l3_miss));
>>  
>> +TRACE_EVENT(mon_llc_occupancy_limbo,
>> +	    TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
>> +	    TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
>> +	    TP_STRUCT__entry(__field(u32, ctrl_hw_id)
>> +			     __field(u32, mon_hw_id)
>> +			     __field(int, domain_id)
>> +			     __field(u64, llc_occupancy_bytes)),
>> +	    TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
>> +			   __entry->mon_hw_id = mon_hw_id;
>> +			   __entry->domain_id = domain_id;
>> +			   __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
>> +	    TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",
> 
> domain_d -> domain_id
> 
>> +		      __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
>> +		      __entry->llc_occupancy_bytes)
>> +	   );
>> +
>>  #endif /* _TRACE_RESCTRL_H */
>>  
>>  #undef TRACE_INCLUDE_PATH
> 
> 
> Reinette

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ