[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20240304065236.281877-3-haifeng.xu@shopee.com>
Date: Mon, 4 Mar 2024 14:52:36 +0800
From: Haifeng Xu <haifeng.xu@...pee.com>
To: reinette.chatre@...el.com,
james.morse@....com
Cc: fenghua.yu@...el.com,
babu.moger@....com,
peternewman@...gle.com,
x86@...nel.org,
linux-kernel@...r.kernel.org,
Haifeng Xu <haifeng.xu@...pee.com>
Subject: [PATCH v4 2/2] x86/resctrl: Add tracepoint for llc_occupancy tracking
In our production environment, after removing monitor groups, those unused
RMIDs get stuck in the limbo list forever because their llc_occupancy are
always larger than the threshold. But the unused RMIDs can be successfully
freed by turning up the threshold.
In order to know how much the threshold should be, perf can be used to
acquire the llc_occupancy of RMIDs in each rdt domain.
Instead of using perf tool to track llc_occupancy and filter the log
manually, it is more convenient for users to use tracepoint to do this
work. So add a new tracepoint that shows the llc_occupancy of busy RMIDs
when scanning the limbo list.
Signed-off-by: Haifeng Xu <haifeng.xu@...pee.com>
Suggested-by: Reinette Chatre <reinette.chatre@...el.com>
Suggested-by: James Morse <james.morse@....com>
---
Documentation/arch/x86/resctrl.rst | 8 ++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 9 +++++++++
arch/x86/kernel/cpu/resctrl/trace.h | 16 ++++++++++++++++
3 files changed, 33 insertions(+)
diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..dd3507dc765c 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -478,6 +478,14 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask
each bit represents 5% of the capacity of the cache. You could partition
the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
+Tracepoint - mon_llc_occupancy_limbo
+------------------------------------
+This tracepoint gives you the precise occupancy values for a subset of RMID
+that are not immediately available for allocation. This can't be relied on
+to produce output every second, it may be necessary to attempt to create an
+empty monitor group to force an update. Output may only be produced if creation
+of a control or monitor group fails.
+
Memory bandwidth Allocation and monitoring
==========================================
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c34a35ec0f03..60b6a29a9e29 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -24,6 +24,7 @@
#include <asm/resctrl.h>
#include "internal.h"
+#include "trace.h"
/**
* struct rmid_entry - dirty tracking for all RMID.
@@ -354,6 +355,14 @@ void __check_limbo(struct rdt_domain *d, bool force_free)
rmid_dirty = true;
} else {
rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+ /* x86's CLOSID and RMID are independent numbers, so the entry's
+ * closid is a invalid CLOSID. But on arm64, the RMID value isn't
+ * a unique number for each CLOSID. It's necessary to track both
+ * CLOSID and RMID because there may be dependencies between each
+ * other on some architectures.
+ */
+ trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->id, val);
}
if (force_free || !rmid_dirty) {
diff --git a/arch/x86/kernel/cpu/resctrl/trace.h b/arch/x86/kernel/cpu/resctrl/trace.h
index ed5c66b8ab0b..b310b4985b94 100644
--- a/arch/x86/kernel/cpu/resctrl/trace.h
+++ b/arch/x86/kernel/cpu/resctrl/trace.h
@@ -35,6 +35,22 @@ TRACE_EVENT(pseudo_lock_l3,
TP_printk("hits=%llu miss=%llu",
__entry->l3_hits, __entry->l3_miss));
+TRACE_EVENT(mon_llc_occupancy_limbo,
+ TP_PROTO(u32 ctrl_hw_id, u32 mon_hw_id, int domain_id, u64 llc_occupancy_bytes),
+ TP_ARGS(ctrl_hw_id, mon_hw_id, domain_id, llc_occupancy_bytes),
+ TP_STRUCT__entry(__field(u32, ctrl_hw_id)
+ __field(u32, mon_hw_id)
+ __field(int, domain_id)
+ __field(u64, llc_occupancy_bytes)),
+ TP_fast_assign(__entry->ctrl_hw_id = ctrl_hw_id;
+ __entry->mon_hw_id = mon_hw_id;
+ __entry->domain_id = domain_id;
+ __entry->llc_occupancy_bytes = llc_occupancy_bytes;),
+ TP_printk("ctrl_hw_id=%u mon_hw_id=%u domain_d=%d llc_occupancy_bytes=%llu",
+ __entry->ctrl_hw_id, __entry->mon_hw_id, __entry->domain_id,
+ __entry->llc_occupancy_bytes)
+ );
+
#endif /* _TRACE_RESCTRL_H */
#undef TRACE_INCLUDE_PATH
--
2.25.1
Powered by blists - more mailing lists