lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cb6764a9-73f4-48b2-bffe-1851b6904b0a@shopee.com>
Date: Tue, 20 Feb 2024 10:26:16 +0800
From: Haifeng Xu <haifeng.xu@...pee.com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: fenghua.yu@...el.com, babu.moger@....com, peternewman@...gle.com,
 james.morse@....com, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/resctrl: Add tracepoint for llc_occupancy tracking



On 2024/2/6 03:36, Reinette Chatre wrote:
> Hi Haifeng,
> 
> Thank you for proposing this feature. I think it is a useful addition.
> I tried it out after removing a monitor group and it was insightful
> to be able to see how the cache occupancy of a removed group
> shrinks over time.
> 
> On 1/26/2024 5:02 AM, Haifeng Xu wrote:
>> If llc_occupany is enabled, the rmid may not be freed immediately unless
> 
> (llc_occupany -> llc_occupancy ... one more instance below)
> 
> Please use caps (RMID) for acronym.
> 
>> its llc_occupany is less than the resctrl_rmid_realloc_threshold.
> 
> I think it will help to highlight that this is related to monitor group
> removal.
> 
>>
>> In our production environment, those unused rmids get stuck in the limbo
>> list forever because their llc_occupancy are larger than the threshold.
>> After turning it up, we can successfully free unused rmids and create
>> new monitor groups. In order to acquire the llc_occupancy of rmids in
>> each rdt domain, we use perf tool to track and filter the log manually.
> 
> Could you please elaborate how you "use perf tool to track and
> filter the log manually"?
> 
>>
>> It's not efficient enough. Therefore, we can add a new tracepoint that
> 
> "It's not efficient enough." is a vague statement. How was efficiency measured?
> and what is "enough"?
> 
> Please do not impersonate code ("we can add a new ...") and stick to the 
> imperative tone. Please see the "Changelog" section in 
> Documentation/process/maintainer-tip.rst for more details.
> 
> 
>> shows the llc_occupancy of busy rmids when scanning the limbo list. It
>> can help us to adjust the resctrl_rmid_realloc_threshold to a reasonable
>> value.
>>
>> Signed-off-by: Haifeng Xu <haifeng.xu@...pee.com>
>> ---
>>  arch/x86/kernel/cpu/resctrl/Makefile        |  1 +
>>  arch/x86/kernel/cpu/resctrl/monitor.c       |  5 ++++
>>  arch/x86/kernel/cpu/resctrl/monitor_event.h | 30 +++++++++++++++++++++
>>  3 files changed, 36 insertions(+)
>>  create mode 100644 arch/x86/kernel/cpu/resctrl/monitor_event.h
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
>> index 4a06c37b9cf1..0d3031850d26 100644
>> --- a/arch/x86/kernel/cpu/resctrl/Makefile
>> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
>> @@ -1,4 +1,5 @@
>>  # SPDX-License-Identifier: GPL-2.0
>>  obj-$(CONFIG_X86_CPU_RESCTRL)	+= core.o rdtgroup.o monitor.o
>>  obj-$(CONFIG_X86_CPU_RESCTRL)	+= ctrlmondata.o pseudo_lock.o
>> +CFLAGS_monitor.o = -I$(src)
>>  CFLAGS_pseudo_lock.o = -I$(src)
> 
> From what I understand only one of the c files should define CREATE_TRACE_POINTS
> and it is that c file that should have its CFLAGS modified.
> 
> I do not think it is necessary to preemptively fragment the resctrl tracing by creating
> a separate header file for the monitor tracepoints. This is something that may be
> needed as part of current re-design but I think it best to have a simple start that
> places all tracepoints in the same header file.
> 
> I would thus like to propose that this work starts by renaming pseudo_lock_event.h
> to a more generic (trace.h?) that will contain all the tracepoints. pseudo-lock.c
> is the file that define's CREATE_TRACE_POINTS so it should remain as the only
> one with its CFLAGS modified. monitor.c should be able to just include "trace.h"
> (after "trace.h" is updated with the new tracepoint) without needing to define
> CREATE_TRACE_POINTS.
> 
> 
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index f136ac046851..a6f94fcae174 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -24,6 +24,10 @@
>>  
>>  #include "internal.h"
>>  
>> +#define CREATE_TRACE_POINTS
>> +#include "monitor_event.h"
>> +#undef CREATE_TRACE_POINTS
>> +
>>  struct rmid_entry {
>>  	u32				rmid;
>>  	int				busy;
>> @@ -302,6 +306,7 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
>>  			}
>>  		}
>>  		crmid = nrmid + 1;
>> +		trace_rmid_llc_occupancy(nrmid, d->id, val);
>>  	}
>>  }
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor_event.h b/arch/x86/kernel/cpu/resctrl/monitor_event.h
>> new file mode 100644
>> index 000000000000..91265a2dd2c9
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor_event.h
>> @@ -0,0 +1,30 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM resctrl
>> +
>> +#if !defined(_TRACE_MONITOR_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_MONITOR_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +TRACE_EVENT(rmid_llc_occupancy,
>> +	    TP_PROTO(u32 rmid, int id, u64 occupancy),
>> +	    TP_ARGS(rmid, id, occupancy),
>> +	    TP_STRUCT__entry(__field(u32, rmid)
>> +			     __field(int, id)
>> +			     __field(u64, occupancy)),
>> +	    TP_fast_assign(__entry->rmid = rmid;
>> +			   __entry->id = id;
>> +			   __entry->occupancy = occupancy;),
>> +	    TP_printk("rmid=%u domain=%d occupancy=%llu",
>> +		      __entry->rmid, __entry->id, __entry->occupancy)
>> +	   );
>> +
> 
> Naming is always complicated but I would like to make two proposals with
> motivation:
> a) Please prefix this event with "mon_" as the first member of a new
>    "monitor" category of resctrl tracepoints. Users can then interact
>    with monitor tracepoints with convenience of "mon*". Later we could
>    perhaps add a new "alloc*" category.
> b) Please replace all "rmid" exposure to user space with a more generic
>    "mon_hw_id", or if "mon" is clear, just "hw_id". For reference you
>    can search for "mon_hw_id" in Documentation/arch/x86/resctrl.rst.
>    This is needed because resctrl is in the process of being able to
>    be an interface for Arm's MPAM and "rmid" is an x86 specific term.
> 
> Considering this, what do you think of something like:
> tracepoint name: mon_llc_occupancy_limbo
> print: "mon_hw_id=%u domain=%d llc_occupancy=%llu"
> 
>> +#endif /* _TRACE_MONITOR_H */
>> +
>> +#undef TRACE_INCLUDE_PATH
>> +#define TRACE_INCLUDE_PATH .
>> +#define TRACE_INCLUDE_FILE monitor_event
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
> 
> Thank you.
> 
> Reinette


Thanks for your suggestions and I'll post a new version later.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ