lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1461905018-86355-11-git-send-email-davidcc@google.com>
Date:	Thu, 28 Apr 2016 21:43:16 -0700
From:	David Carrillo-Cisneros <davidcc@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ingo Molnar <mingo@...hat.com>
Cc:	Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
	Matt Fleming <matt.fleming@...el.com>,
	Tony Luck <tony.luck@...el.com>,
	Stephane Eranian <eranian@...gle.com>,
	Paul Turner <pjt@...gle.com>,
	David Carrillo-Cisneros <davidcc@...gle.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH 10/32] perf/x86/intel/cqm: basic RMID hierarchy with per package rmids

Cgroups and/or tasks that require to be monitored using a RMID
are abstracted as a MOnitored Resources (monr's). A CQM event points
to a monr to read occupancy (and in the future other attributes) of the
RMIDs associated to the monr.

The monrs form a hierarchy that captures the dependency within the
monitored cgroups and/or tasks/threads. The monr of a cgroup A which
contains another monitored cgroup, B, is an ancestor of B's monr.

Each monr contains one Package MONitored Resource (pmonr) per package.
The monitoring of a monr in a package starts when its corresponding
pmonr receives an RMID for that package (a prmid).

The prmids are lazily assigned to a pmonr the first time a thread
using the monr is scheduled in the package. When a pmonr with a
valid prmid is scheduled, that pmonr's prmid's RMID is written to the
msr MSR_IA32_PQR_ASSOC. If no prmid is available, the prmid of the lowest
ancestor in the monr hierarchy with a valid prmid for that package is
used instead.

A pmonr can be in one of following three states:
  - (A)ctive: When it has a prmid available.
  - (I)nherited: When no prmid is available. In this state, it "borrows"
    the prmid of its lowest ancestor in (A)ctive state during sched in
    (writes its ancestor's RMID into hw while any associated thread is
    executed). But, since the "borrowed" prmid do not monitor the
    occupancy of this monr, the monr cannot report occupancy individually.
  - (U)nused: When the monr does not have a prmid yet and have no failed
    acquiring one (either because no thread has been scheduled while
    monitoring for this pmonr is active or because it has been completed
    a transition to (U)state, ie. termination of the associated
    event/cgroup).

To avoid synchronization overhead, each prmid contains a prmid_summary.
The union prmid_summary is a concise representation of the prmid state
and its raw RMIDs. Due to its size, the prmid_summary can be read
atomically without a LOCK instruction. Every state transition atomically
updates the prmid_summary. This avoids locking during sched in and out
of threads, except in the cases that a prmid needs to be allocated,
but this only occurs the first time a monr is scheduled in a package.

This patch introduces a first iteration of the monr hierarchy
that maintains two levels: the root monr, at top, and all other monrs
as leaves. The root monr is always (A)ctive.

This patch also implements the essential mechanism of per-package lazy
allocation of RMID.

The (I)state and the transitions from and to it are introduced in the
next patch in this series.

Reviewed-by: Stephane Eranian <eranian@...gle.com>
Signed-off-by: David Carrillo-Cisneros <davidcc@...gle.com>
---
 arch/x86/events/intel/cqm.c | 633 ++++++++++++++++++++++++++++++++++++--------
 arch/x86/events/intel/cqm.h | 149 +++++++++++
 include/linux/perf_event.h  |   2 +-
 3 files changed, 674 insertions(+), 110 deletions(-)

diff --git a/arch/x86/events/intel/cqm.c b/arch/x86/events/intel/cqm.c
index 541e515..65551bb 100644
--- a/arch/x86/events/intel/cqm.c
+++ b/arch/x86/events/intel/cqm.c
@@ -35,28 +35,66 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
 static LIST_HEAD(cache_groups);
 static DEFINE_MUTEX(cqm_mutex);
 
+struct monr *monr_hrchy_root;
+
 struct pkg_data *cqm_pkgs_data[PQR_MAX_NR_PKGS];
 
-/*
- * Is @rmid valid for programming the hardware?
- *
- * rmid 0 is reserved by the hardware for all non-monitored tasks, which
- * means that we should never come across an rmid with that value.
- * Likewise, an rmid value of -1 is used to indicate "no rmid currently
- * assigned" and is used as part of the rotation code.
- */
-static inline bool __rmid_valid(u32 rmid)
+static inline bool __pmonr__in_astate(struct pmonr *pmonr)
 {
-	if (!rmid || rmid == INVALID_RMID)
-		return false;
+	lockdep_assert_held(&__pkg_data(pmonr, pkg_data_lock));
+	return pmonr->prmid;
+}
 
-	return true;
+static inline bool __pmonr__in_ustate(struct pmonr *pmonr)
+{
+	lockdep_assert_held(&__pkg_data(pmonr, pkg_data_lock));
+	return !pmonr->prmid;
 }
 
-static u64 __rmid_read(u32 rmid)
+static inline bool monr__is_root(struct monr *monr)
 {
-	/* XXX: Placeholder, will be removed in next patch. */
-	return 0;
+	return monr_hrchy_root == monr;
+}
+
+static inline bool monr__is_mon_active(struct monr *monr)
+{
+	return monr->flags & MONR_MON_ACTIVE;
+}
+
+static inline void __monr__set_summary_read_rmid(struct monr *monr, u32 rmid)
+{
+	int i;
+	struct pmonr *pmonr;
+	union prmid_summary summary;
+
+	monr_hrchy_assert_held_raw_spin_locks();
+
+	cqm_pkg_id_for_each_online(i) {
+		pmonr = monr->pmonrs[i];
+		WARN_ON_ONCE(!__pmonr__in_ustate(pmonr));
+		summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+		summary.read_rmid = rmid;
+		atomic64_set(&pmonr->prmid_summary_atomic, summary.value);
+	}
+}
+
+static inline void __monr__set_mon_active(struct monr *monr)
+{
+	monr_hrchy_assert_held_raw_spin_locks();
+	__monr__set_summary_read_rmid(monr, 0);
+	monr->flags |= MONR_MON_ACTIVE;
+}
+
+/*
+ * All pmonrs must be in (U)state.
+ * clearing MONR_MON_ACTIVE prevents (U)state prmids from transitioning
+ * to another state.
+ */
+static inline void __monr__clear_mon_active(struct monr *monr)
+{
+	monr_hrchy_assert_held_raw_spin_locks();
+	__monr__set_summary_read_rmid(monr, INVALID_RMID);
+	monr->flags &= ~MONR_MON_ACTIVE;
 }
 
 /*
@@ -133,22 +171,6 @@ static inline bool __valid_pkg_id(u16 pkg_id)
 	return pkg_id < PQR_MAX_NR_PKGS;
 }
 
-/*
- * Returns < 0 on fail.
- *
- * We expect to be called with cache_mutex held.
- */
-static u32 __get_rmid(void)
-{
-	/* XXX: Placeholder, will be removed in next patch. */
-	return 0;
-}
-
-static void __put_rmid(u32 rmid)
-{
-	/* XXX: Placeholder, will be removed in next patch. */
-}
-
 /* Init cqm pkg_data for @cpu 's package. */
 static int pkg_data_init_cpu(int cpu)
 {
@@ -187,6 +209,10 @@ static int pkg_data_init_cpu(int cpu)
 	}
 
 	INIT_LIST_HEAD(&pkg_data->free_prmids_pool);
+	INIT_LIST_HEAD(&pkg_data->active_prmids_pool);
+	INIT_LIST_HEAD(&pkg_data->nopmonr_limbo_prmids_pool);
+
+	INIT_LIST_HEAD(&pkg_data->astate_pmonrs_lru);
 
 	mutex_init(&pkg_data->pkg_data_mutex);
 	raw_spin_lock_init(&pkg_data->pkg_data_lock);
@@ -225,12 +251,129 @@ __prmid_from_rmid(u16 pkg_id, u32 rmid)
 	return prmid;
 }
 
+static struct pmonr *pmonr_alloc(int cpu)
+{
+	struct pmonr *pmonr;
+	union prmid_summary summary;
+
+	pmonr = kmalloc_node(sizeof(struct pmonr),
+			     GFP_KERNEL, cpu_to_node(cpu));
+	if (!pmonr)
+		return ERR_PTR(-ENOMEM);
+
+	pmonr->prmid = NULL;
+
+	pmonr->monr = NULL;
+	INIT_LIST_HEAD(&pmonr->rotation_entry);
+
+	pmonr->pkg_id = topology_physical_package_id(cpu);
+	summary.sched_rmid = INVALID_RMID;
+	summary.read_rmid = INVALID_RMID;
+	atomic64_set(&pmonr->prmid_summary_atomic, summary.value);
+
+	return pmonr;
+}
+
+static void pmonr_dealloc(struct pmonr *pmonr)
+{
+	kfree(pmonr);
+}
+
+/*
+ * @root: Common ancestor.
+ * a bust be distinct to b.
+ * @true if a is ancestor of b.
+ */
+static inline bool
+__monr_hrchy_is_ancestor(struct monr *root,
+			 struct monr *a, struct monr *b)
+{
+	WARN_ON_ONCE(!root || !a || !b);
+	WARN_ON_ONCE(a == b);
+
+	if (root == a)
+		return true;
+	if (root == b)
+		return false;
+
+	b = b->parent;
+	/* Break at the root */
+	while (b != root) {
+		WARN_ON_ONCE(!b);
+		if (a == b)
+			return true;
+		b = b->parent;
+	}
+	return false;
+}
+
+/* helper function to finish transition to astate. */
+static inline void
+__pmonr__finish_to_astate(struct pmonr *pmonr, struct prmid *prmid)
+{
+	union prmid_summary summary;
+
+	lockdep_assert_held(&__pkg_data(pmonr, pkg_data_lock));
+
+	pmonr->prmid = prmid;
+
+	list_move_tail(
+		&prmid->pool_entry, &__pkg_data(pmonr, active_prmids_pool));
+	list_move_tail(
+		&pmonr->rotation_entry, &__pkg_data(pmonr, astate_pmonrs_lru));
+
+	summary.sched_rmid = pmonr->prmid->rmid;
+	summary.read_rmid = pmonr->prmid->rmid;
+	atomic64_set(&pmonr->prmid_summary_atomic, summary.value);
+}
+
+static inline void
+__pmonr__ustate_to_astate(struct pmonr *pmonr, struct prmid *prmid)
+{
+	lockdep_assert_held(&__pkg_data(pmonr, pkg_data_lock));
+	__pmonr__finish_to_astate(pmonr, prmid);
+}
+
+static inline void
+__pmonr__to_ustate(struct pmonr *pmonr)
+{
+	union prmid_summary summary;
+
+	lockdep_assert_held(&__pkg_data(pmonr, pkg_data_lock));
+
+	/* Do not warn on re-enter state for (U)state, to simplify cleanup
+	 * of initialized states that were not scheduled.
+	 */
+	if (__pmonr__in_ustate(pmonr))
+		return;
+
+	if (__pmonr__in_astate(pmonr)) {
+		WARN_ON_ONCE(!pmonr->prmid);
+
+		list_move_tail(&pmonr->prmid->pool_entry,
+			       &__pkg_data(pmonr, nopmonr_limbo_prmids_pool));
+		pmonr->prmid =  NULL;
+	} else {
+		WARN_ON_ONCE(true);
+		return;
+	}
+	list_del_init(&pmonr->rotation_entry);
+
+	summary.sched_rmid = INVALID_RMID;
+	summary.read_rmid  =
+		monr__is_mon_active(pmonr->monr) ? 0 : INVALID_RMID;
+
+	atomic64_set(&pmonr->prmid_summary_atomic, summary.value);
+	WARN_ON_ONCE(!__pmonr__in_ustate(pmonr));
+}
+
 static int intel_cqm_setup_pkg_prmid_pools(u16 pkg_id)
 {
 	int r;
 	unsigned long flags;
 	struct prmid *prmid;
 	struct pkg_data *pkg_data = cqm_pkgs_data[pkg_id];
+	struct pmonr *root_pmonr;
 
 	if (!__valid_pkg_id(pkg_id))
 		return -EINVAL;
@@ -252,12 +395,13 @@ static int intel_cqm_setup_pkg_prmid_pools(u16 pkg_id)
 			&pkg_data->pkg_data_lock, flags, pkg_id);
 		pkg_data->prmids_by_rmid[r] = prmid;
 
+		list_add_tail(&prmid->pool_entry, &pkg_data->free_prmids_pool);
 
 		/* RMID 0 is special and makes the root of rmid hierarchy. */
-		if (r != 0)
-			list_add_tail(&prmid->pool_entry,
-				      &pkg_data->free_prmids_pool);
-
+		if (r == 0) {
+			root_pmonr = monr_hrchy_root->pmonrs[pkg_id];
+			__pmonr__ustate_to_astate(root_pmonr, prmid);
+		}
 		raw_spin_unlock_irqrestore(&pkg_data->pkg_data_lock, flags);
 	}
 	return 0;
@@ -273,6 +417,232 @@ fail:
 }
 
 
+/* Alloc monr with all pmonrs in (U)state. */
+static struct monr *monr_alloc(void)
+{
+	int i;
+	struct pmonr *pmonr;
+	struct monr *monr;
+
+	monr = kmalloc(sizeof(struct monr), GFP_KERNEL);
+
+	if (!monr)
+		return ERR_PTR(-ENOMEM);
+
+	monr->flags = 0;
+	monr->parent = NULL;
+	INIT_LIST_HEAD(&monr->children);
+	INIT_LIST_HEAD(&monr->parent_entry);
+	monr->mon_event_group = NULL;
+
+	/* Iterate over all pkgs, even unitialized ones. */
+	for (i = 0; i < PQR_MAX_NR_PKGS; i++) {
+		/* Do not create pmonrs for unitialized packages. */
+		if (!cqm_pkgs_data[i]) {
+			monr->pmonrs[i] = NULL;
+			continue;
+		}
+		/* Rotation cpu is on pmonr's package. */
+		pmonr = pmonr_alloc(cqm_pkgs_data[i]->rotation_cpu);
+		if (IS_ERR(pmonr))
+			goto clean_pmonrs;
+		pmonr->monr = monr;
+		monr->pmonrs[i] = pmonr;
+	}
+	return monr;
+
+clean_pmonrs:
+	while (i--) {
+		if (cqm_pkgs_data[i])
+			kfree(monr->pmonrs[i]);
+	}
+	kfree(monr);
+	return ERR_PTR(PTR_ERR(pmonr));
+}
+
+/* Only can dealloc monrs with all pmonrs in (U)state. */
+static void monr_dealloc(struct monr *monr)
+{
+	int i;
+
+	cqm_pkg_id_for_each_online(i)
+		pmonr_dealloc(monr->pmonrs[i]);
+
+	kfree(monr);
+}
+
+/*
+ * Wrappers for monr manipulation in events.
+ *
+ */
+static inline struct monr *monr_from_event(struct perf_event *event)
+{
+	return (struct monr *) READ_ONCE(event->hw.cqm_monr);
+}
+
+static inline void event_set_monr(struct perf_event *event, struct monr *monr)
+{
+	WRITE_ONCE(event->hw.cqm_monr, monr);
+}
+
+/*
+ * Always finds a rmid_entry to schedule. To be called during scheduler.
+ * A fast path that only uses read_lock for common case when rmid for current
+ * package has been used before.
+ * On failure, verify that monr is active, if it is, try to obtain a free rmid
+ * and set pmonr to (A)state.
+ * On failure, transverse up monr_hrchy until finding one prmid for this
+ * pkg_id and set pmonr to (I)state.
+ * Called during task switch, it will set pmonr's prmid_summary to reflect the
+ * sched and read rmids that reflect pmonr's state.
+ */
+static inline void
+monr_hrchy_get_next_prmid_summary(struct pmonr *pmonr)
+{
+	union prmid_summary summary;
+
+	/*
+	 * First, do lock-free fastpath.
+	 */
+	summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+	if (summary.sched_rmid != INVALID_RMID)
+		return;
+
+	if (!prmid_summary__is_mon_active(summary))
+		return;
+
+	/*
+	 * Lock-free path failed at first attempt. Now acquire lock and repeat
+	 * in case the monr was modified in the mean time.
+	 * This time try to obtain free rmid and update pmonr accordingly,
+	 * instead of failing fast.
+	 */
+	raw_spin_lock_nested(&__pkg_data(pmonr, pkg_data_lock), pmonr->pkg_id);
+
+	summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+	if (summary.sched_rmid != INVALID_RMID) {
+		raw_spin_unlock(&__pkg_data(pmonr, pkg_data_lock));
+		return;
+	}
+
+	/* Do not try to obtain RMID if monr is not active. */
+	if (!prmid_summary__is_mon_active(summary)) {
+		raw_spin_unlock(&__pkg_data(pmonr, pkg_data_lock));
+		return;
+	}
+
+	/*
+	 * Can only fail if it was in (U)state.
+	 * Try to obtain a free prmid and go to (A)state, if not possible,
+	 * it should go to (I)state.
+	 */
+	WARN_ON_ONCE(!__pmonr__in_ustate(pmonr));
+
+	if (!list_empty(&__pkg_data(pmonr, free_prmids_pool))) {
+		/* Failed to obtain an valid rmid in this package for this
+		 * monr. In next patches it will transition to (I)state.
+		 * For now, stay in (U)state (do nothing)..
+		 */
+	} else {
+		/* Transition to (A)state using free prmid. */
+		__pmonr__ustate_to_astate(
+			pmonr,
+			list_first_entry(&__pkg_data(pmonr, free_prmids_pool),
+				struct prmid, pool_entry));
+	}
+	raw_spin_unlock(&__pkg_data(pmonr, pkg_data_lock));
+}
+
+static inline void __assert_monr_is_leaf(struct monr *monr)
+{
+	int i;
+
+	monr_hrchy_assert_held_mutexes();
+	monr_hrchy_assert_held_raw_spin_locks();
+
+	cqm_pkg_id_for_each_online(i)
+		WARN_ON_ONCE(!__pmonr__in_ustate(monr->pmonrs[i]));
+
+	WARN_ON_ONCE(!list_empty(&monr->children));
+}
+
+static inline void
+__monr_hrchy_insert_leaf(struct monr *monr, struct monr *parent)
+{
+	monr_hrchy_assert_held_mutexes();
+	monr_hrchy_assert_held_raw_spin_locks();
+
+	__assert_monr_is_leaf(monr);
+
+	list_add_tail(&monr->parent_entry, &parent->children);
+	monr->parent = parent;
+}
+
+static inline void
+__monr_hrchy_remove_leaf(struct monr *monr)
+{
+	/* Since root cannot be removed, monr must have a parent */
+	WARN_ON_ONCE(!monr->parent);
+
+	monr_hrchy_assert_held_mutexes();
+	monr_hrchy_assert_held_raw_spin_locks();
+
+	__assert_monr_is_leaf(monr);
+
+	list_del_init(&monr->parent_entry);
+	monr->parent = NULL;
+}
+
+static int __monr_hrchy_attach_cpu_event(struct perf_event *event)
+{
+	lockdep_assert_held(&cqm_mutex);
+	WARN_ON_ONCE(monr_from_event(event));
+
+	event_set_monr(event, monr_hrchy_root);
+	return 0;
+}
+
+/* task events are always leaves in the monr_hierarchy */
+static int __monr_hrchy_attach_task_event(struct perf_event *event,
+					  struct monr *parent_monr)
+{
+	struct monr *monr;
+	unsigned long flags;
+	int i;
+
+	lockdep_assert_held(&cqm_mutex);
+
+	monr = monr_alloc();
+	if (IS_ERR(monr))
+		return PTR_ERR(monr);
+	event_set_monr(event, monr);
+	monr->mon_event_group = event;
+
+	monr_hrchy_acquire_locks(flags, i);
+	__monr_hrchy_insert_leaf(monr, parent_monr);
+	__monr__set_mon_active(monr);
+	monr_hrchy_release_locks(flags, i);
+
+	return 0;
+}
+
+/*
+ * Find appropriate position in hierarchy and set monr. Create new
+ * monr if necessary.
+ * Locks rmid hrchy.
+ */
+static int monr_hrchy_attach_event(struct perf_event *event)
+{
+	struct monr *monr_parent;
+
+	if (!event->cgrp && !(event->attach_state & PERF_ATTACH_TASK))
+		return __monr_hrchy_attach_cpu_event(event);
+
+	/* Two-levels hierarchy: Root and all event monr underneath it. */
+	monr_parent = monr_hrchy_root;
+	return __monr_hrchy_attach_task_event(event, monr_parent);
+}
+
 /*
  * Determine if @a and @b measure the same set of tasks.
  *
@@ -291,7 +661,7 @@ static bool __match_event(struct perf_event *a, struct perf_event *b)
 		return false;
 #endif
 
-	/* If not task event, we're machine wide */
+	/* If not task event, it's a a cgroup or a non-task cpu event. */
 	if (!(b->attach_state & PERF_ATTACH_TASK))
 		return true;
 
@@ -310,69 +680,51 @@ static bool __match_event(struct perf_event *a, struct perf_event *b)
 	return false;
 }
 
-struct rmid_read {
-	u32 rmid;
-	atomic64_t value;
-};
-
 static struct pmu intel_cqm_pmu;
 
 /*
  * Find a group and setup RMID.
  *
- * If we're part of a group, we use the group's RMID.
+ * If we're part of a group, we use the group's monr.
  */
-static void intel_cqm_setup_event(struct perf_event *event,
-				  struct perf_event **group)
+static int
+intel_cqm_setup_event(struct perf_event *event, struct perf_event **group)
 {
 	struct perf_event *iter;
-	bool conflict = false;
-	u32 rmid;
+	struct monr *monr;
+	*group = NULL;
 
-	list_for_each_entry(iter, &cache_groups, hw.cqm_event_groups_entry) {
-		rmid = iter->hw.cqm_rmid;
+	lockdep_assert_held(&cqm_mutex);
 
+	list_for_each_entry(iter, &cache_groups, hw.cqm_event_groups_entry) {
+		monr = monr_from_event(iter);
 		if (__match_event(iter, event)) {
-			/* All tasks in a group share an RMID */
-			event->hw.cqm_rmid = rmid;
+			/* All tasks in a group share an monr. */
+			event_set_monr(event, monr);
 			*group = iter;
-			return;
+			return 0;
 		}
 	}
-
-	if (conflict)
-		rmid = INVALID_RMID;
-	else
-		rmid = __get_rmid();
-
-	event->hw.cqm_rmid = rmid;
+	/*
+	 * Since no match was found, create a new monr and set this
+	 * event as head of a new cache group. All events in this cache group
+	 * will share the monr.
+	 */
+	return monr_hrchy_attach_event(event);
 }
 
+/* Read current package immediately and remote pkg (if any) from cache. */
 static void intel_cqm_event_read(struct perf_event *event)
 {
-	unsigned long flags;
-	u32 rmid;
-	u64 val;
+	union prmid_summary summary;
+	struct prmid *prmid;
 	u16 pkg_id = topology_physical_package_id(smp_processor_id());
+	struct pmonr *pmonr = monr_from_event(event)->pmonrs[pkg_id];
 
-	raw_spin_lock_irqsave(&cqm_pkgs_data[pkg_id]->pkg_data_lock, flags);
-	rmid = event->hw.cqm_rmid;
-
-	if (!__rmid_valid(rmid))
-		goto out;
-
-	val = __rmid_read(rmid);
-
-	/*
-	 * Ignore this reading on error states and do not update the value.
-	 */
-	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
-		goto out;
-
-	local64_set(&event->count, val);
-out:
-	raw_spin_unlock_irqrestore(
-		&cqm_pkgs_data[pkg_id]->pkg_data_lock, flags);
+	summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+	prmid = __prmid_from_rmid(pkg_id, summary.read_rmid);
+	cqm_prmid_update(prmid);
+	local64_set(&event->count, atomic64_read(&prmid->last_read_value));
 }
 
 static inline bool cqm_group_leader(struct perf_event *event)
@@ -380,52 +732,81 @@ static inline bool cqm_group_leader(struct perf_event *event)
 	return !list_empty(&event->hw.cqm_event_groups_entry);
 }
 
-static void intel_cqm_event_start(struct perf_event *event, int mode)
+static inline void __intel_cqm_event_start(
+	struct perf_event *event, union prmid_summary summary)
 {
 	u16 pkg_id = topology_physical_package_id(smp_processor_id());
 	if (!(event->hw.state & PERF_HES_STOPPED))
 		return;
 
 	event->hw.state &= ~PERF_HES_STOPPED;
-	__update_pqr_prmid(__prmid_from_rmid(pkg_id, event->hw.cqm_rmid));
+	__update_pqr_prmid(__prmid_from_rmid(pkg_id, summary.sched_rmid));
+}
+
+static void intel_cqm_event_start(struct perf_event *event, int mode)
+{
+	union prmid_summary summary;
+	u16 pkg_id = topology_physical_package_id(smp_processor_id());
+	struct pmonr *pmonr = monr_from_event(event)->pmonrs[pkg_id];
+
+	/* Utilize most up to date pmonr summary. */
+	monr_hrchy_get_next_prmid_summary(pmonr);
+	summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+	__intel_cqm_event_start(event, summary);
 }
 
 static void intel_cqm_event_stop(struct perf_event *event, int mode)
 {
+	union prmid_summary summary;
 	u16 pkg_id = topology_physical_package_id(smp_processor_id());
+	struct pmonr *root_pmonr = monr_hrchy_root->pmonrs[pkg_id];
+
 	if (event->hw.state & PERF_HES_STOPPED)
 		return;
 
 	event->hw.state |= PERF_HES_STOPPED;
-	intel_cqm_event_read(event);
-	__update_pqr_prmid(__prmid_from_rmid(pkg_id, 0));
+
+	summary.value = atomic64_read(&root_pmonr->prmid_summary_atomic);
+	/* Occupancy of CQM events is obtained at read. No need to read
+	 * when event is stopped since read on inactive cpus succeed.
+	 */
+	__update_pqr_prmid(__prmid_from_rmid(pkg_id, summary.sched_rmid));
 }
 
 static int intel_cqm_event_add(struct perf_event *event, int mode)
 {
-	unsigned long flags;
-	u32 rmid;
+	struct monr *monr;
+	struct pmonr *pmonr;
+	union prmid_summary summary;
 	u16 pkg_id = topology_physical_package_id(smp_processor_id());
 
-	raw_spin_lock_irqsave(&cqm_pkgs_data[pkg_id]->pkg_data_lock, flags);
+	monr = monr_from_event(event);
+	pmonr = monr->pmonrs[pkg_id];
 
 	event->hw.state = PERF_HES_STOPPED;
-	rmid = event->hw.cqm_rmid;
 
-	if (__rmid_valid(rmid) && (mode & PERF_EF_START))
-		intel_cqm_event_start(event, mode);
+	/* Utilize most up to date pmonr summary. */
+	monr_hrchy_get_next_prmid_summary(pmonr);
+	summary.value = atomic64_read(&pmonr->prmid_summary_atomic);
+
+	if (!prmid_summary__is_mon_active(summary))
+		return -1;
 
-	raw_spin_unlock_irqrestore(
-		&cqm_pkgs_data[pkg_id]->pkg_data_lock, flags);
+	if (mode & PERF_EF_START)
+		__intel_cqm_event_start(event, summary);
+
+	/* (I)state pmonrs cannot report occupancy for themselves. */
 	return 0;
 }
 
 static void intel_cqm_event_destroy(struct perf_event *event)
 {
 	struct perf_event *group_other = NULL;
+	struct monr *monr;
+	int i;
+	unsigned long flags;
 
 	mutex_lock(&cqm_mutex);
-
 	/*
 	 * If there's another event in this group...
 	 */
@@ -435,33 +816,56 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 					       hw.cqm_event_group_entry);
 		list_del(&event->hw.cqm_event_group_entry);
 	}
-
 	/*
 	 * And we're the group leader..
 	 */
-	if (cqm_group_leader(event)) {
-		/*
-		 * If there was a group_other, make that leader, otherwise
-		 * destroy the group and return the RMID.
-		 */
-		if (group_other) {
-			list_replace(&event->hw.cqm_event_groups_entry,
-				     &group_other->hw.cqm_event_groups_entry);
-		} else {
-			u32 rmid = event->hw.cqm_rmid;
-
-			if (__rmid_valid(rmid))
-				__put_rmid(rmid);
-			list_del(&event->hw.cqm_event_groups_entry);
-		}
+	if (!cqm_group_leader(event))
+		goto exit;
+
+	monr = monr_from_event(event);
+
+	/*
+	 * If there was a group_other, make that leader, otherwise
+	 * destroy the group and return the RMID.
+	 */
+	if (group_other) {
+		/* Update monr reference to group head. */
+		monr->mon_event_group = group_other;
+		list_replace(&event->hw.cqm_event_groups_entry,
+			     &group_other->hw.cqm_event_groups_entry);
+		goto exit;
 	}
 
+	/*
+	 * Event is the only event in cache group.
+	 */
+
+	event_set_monr(event, NULL);
+	list_del(&event->hw.cqm_event_groups_entry);
+
+	if (monr__is_root(monr))
+		goto exit;
+
+	/* Transition all pmonrs to (U)state. */
+	monr_hrchy_acquire_locks(flags, i);
+
+	cqm_pkg_id_for_each_online(i)
+		__pmonr__to_ustate(monr->pmonrs[i]);
+
+	__monr__clear_mon_active(monr);
+	monr->mon_event_group = NULL;
+	__monr_hrchy_remove_leaf(monr);
+	monr_hrchy_release_locks(flags, i);
+
+	monr_dealloc(monr);
+exit:
 	mutex_unlock(&cqm_mutex);
 }
 
 static int intel_cqm_event_init(struct perf_event *event)
 {
 	struct perf_event *group = NULL;
+	int ret;
 
 	if (event->attr.type != intel_cqm_pmu.type)
 		return -ENOENT;
@@ -488,7 +892,11 @@ static int intel_cqm_event_init(struct perf_event *event)
 
 
 	/* Will also set rmid */
-	intel_cqm_setup_event(event, &group);
+	ret = intel_cqm_setup_event(event, &group);
+	if (ret) {
+		mutex_unlock(&cqm_mutex);
+		return ret;
+	}
 
 	if (group) {
 		list_add_tail(&event->hw.cqm_event_group_entry,
@@ -697,6 +1105,12 @@ static int __init intel_cqm_init(void)
 			goto error;
 	}
 
+	monr_hrchy_root = monr_alloc();
+	if (IS_ERR(monr_hrchy_root)) {
+		ret = PTR_ERR(monr_hrchy_root);
+		goto error;
+	}
+
 	/* Select the minimum of the maximum rmids to use as limit for
 	 * threshold. XXX: per-package threshold.
 	 */
@@ -705,6 +1119,7 @@ static int __init intel_cqm_init(void)
 			min_max_rmid = cqm_pkgs_data[i]->max_rmid;
 		intel_cqm_setup_pkg_prmid_pools(i);
 	}
+	monr_hrchy_root->flags |= MONR_MON_ACTIVE;
 
 	/*
 	 * A reasonable upper limit on the max threshold is the number
diff --git a/arch/x86/events/intel/cqm.h b/arch/x86/events/intel/cqm.h
index a25d49b..81092f2 100644
--- a/arch/x86/events/intel/cqm.h
+++ b/arch/x86/events/intel/cqm.h
@@ -45,14 +45,111 @@ static unsigned int __rmid_min_update_time = RMID_DEFAULT_MIN_UPDATE_TIME;
 
 static inline int cqm_prmid_update(struct prmid *prmid);
 
+/*
+ * union prmid_summary: Machine-size summary of a pmonr's prmid state.
+ * @value:		One word accesor.
+ * @rmid:		rmid for prmid.
+ * @sched_rmid:		The rmid to write in the PQR MSR.
+ * @read_rmid:		The rmid to read occupancy from.
+ *
+ * The prmid_summarys are read atomically and without the need of LOCK
+ * instructions during event and group scheduling in task context switch.
+ * They are set when a prmid change state and allow lock-free fast paths for
+ * RMID scheduling and RMID read for the common case when prmid does not need
+ * to change state.
+ * The combination of values in sched_rmid and read_rmid indicate the state of
+ * the associated pmonr (see pmonr comments) as follows:
+ *					pmonr state
+ *	      |	 (A)state	    (U)state
+ * ----------------------------------------------------------------------------
+ * sched_rmid |	pmonr.prmid	   INVALID_RMID
+ *  read_rmid |	pmonr.prmid	   INVALID_RMID
+ *				      (or 0)
+ *
+ * The combination sched_rmid == INVALID_RMID and read_rmid == 0 for (U)state
+ * denotes that the flag MONR_MON_ACTIVE is set in the monr associated with
+ * the pmonr for this prmid_summary.
+ */
+union prmid_summary {
+	long long	value;
+	struct {
+		u32	sched_rmid;
+		u32	read_rmid;
+	};
+};
+
 # define INVALID_RMID (-1)
 
+/* A pmonr in (U)state has no sched_rmid, read_rmid can be 0 or INVALID_RMID
+ * depending on whether monitoring is active or not.
+ */
+inline bool prmid_summary__is_ustate(union prmid_summary summ)
+{
+	return summ.sched_rmid == INVALID_RMID;
+}
+
+inline bool prmid_summary__is_mon_active(union prmid_summary summ)
+{
+	/* If not in (U)state, then MONR_MON_ACTIVE must be set. */
+	return summ.sched_rmid != INVALID_RMID ||
+	       summ.read_rmid == 0;
+}
+
+struct monr;
+
+/* struct pmonr: Node of per-package hierarchy of MONitored Resources.
+ * @prmid:			The prmid of this pmonr -when in (A)state-.
+ * @rotation_entry:		List entry to attach to astate_pmonrs_lru
+ *				in pkg_data.
+ * @monr:			The monr that contains this pmonr.
+ * @pkg_id:			Auxiliar variable with pkg id for this pmonr.
+ * @prmid_summary_atomic:	Atomic accesor to store a union prmid_summary
+ *				that represent the state of this pmonr.
+ *
+ * A pmonr forms a per-package hierarchy of prmids. Each one represents a
+ * resource to be monitored and can hold a prmid. Due to rmid scarcity,
+ * rmids can be recycled and rotated. When a rmid is not available for this
+ * pmonr, the pmonr utilizes the rmid of its ancestor.
+ * A pmonr is always in one of the following states:
+ *   - (A)ctive:	Has @prmid assigned, @ancestor_pmonr must be NULL.
+ *   - (U)nused:	No @ancestor_pmonr and no @prmid, hence no available
+ *			prmid and no inhering one either. Not in rotation list.
+ *			This state is unschedulable and a prmid
+ *			should be found (either o free one or ancestor's) before
+ *			scheduling a thread with (U)state pmonr in
+ *			a cpu in this package.
+ *
+ * The state transitions are:
+ *   (U) : The initial state. Starts there after allocation.
+ *   (U) -> (A): If on first sched (or initialization) pmonr receives a prmid.
+ *   (A) -> (U): On destruction of monr.
+ *
+ * Each pmonr is contained by a monr.
+ */
+struct pmonr {
+
+	struct prmid				*prmid;
+
+	struct monr				*monr;
+	struct list_head			rotation_entry;
+
+	u16					pkg_id;
+
+	/* all writers are sync'ed by package's lock. */
+	atomic64_t				prmid_summary_atomic;
+};
+
 /*
  * struct pkg_data: Per-package CQM data.
  * @max_rmid:			Max rmid valid for cpus in this package.
  * @prmids_by_rmid:		Utility mapping between rmid values and prmids.
  *				XXX: Make it an array of prmids.
  * @free_prmid_pool:		Free prmids.
+ * @active_prmid_pool:		prmids associated with a (A)state pmonr.
+ * @nopmonr_limbo_prmid_pool:	prmids in limbo state that are not referenced
+ *				by a pmonr.
+ * @astate_pmonrs_lru:		pmonrs in (A)state. LRU in increasing order of
+ *				pmonr.last_enter_astate.
  * @pkg_data_mutex:		Hold for stability when modifying pmonrs
  *				hierarchy.
  * @pkg_data_lock:		Hold to protect variables that may be accessed
@@ -71,6 +168,12 @@ struct pkg_data {
 	 * Pools of prmids used in rotation logic.
 	 */
 	struct list_head	free_prmids_pool;
+	/* Can be modified during task switch with (U)state -> (A)state. */
+	struct list_head	active_prmids_pool;
+	/* Only modified during rotation logic and deletion. */
+	struct list_head	nopmonr_limbo_prmids_pool;
+
+	struct list_head	astate_pmonrs_lru;
 
 	struct mutex		pkg_data_mutex;
 	raw_spinlock_t		pkg_data_lock;
@@ -78,6 +181,52 @@ struct pkg_data {
 	int			rotation_cpu;
 };
 
+/*
+ * Flags for monr.
+ */
+#define MONR_MON_ACTIVE		0x1
+
+/*
+ * struct monr: MONitored Resource.
+ * @flags:		Flags field for monr (XXX: More flags will be added
+ *			with MBM).
+ * @mon_event_group:	The head of event's group that use this monr, if any.
+ * @parent:		Parent in monr hierarchy.
+ * @children:		List of children in monr hierarchy.
+ * @parent_entry:	Entry in parent's children list.
+ * @pmonrs:		Per-package pmonr for this monr.
+ *
+ * Each cgroup or thread that requires a RMID will have a corresponding
+ * monr in the system-wide hierarchy reflecting it's position in the
+ * cgroup/thread hierarchy.
+ * An monr is assigned to every CQM event and/or monitored cgroups when
+ * monitoring is activated and that instance's address do not change during
+ * the lifetime of the event or cgroup.
+ *
+ * On creation, the monr has flags cleared and all its pmonrs in (U)state.
+ * The flag MONR_MON_ACTIVE must be set to enable any transition out of
+ * (U)state to occur.
+ */
+struct monr {
+	u16				flags;
+	/* Back reference pointers */
+	struct perf_event		*mon_event_group;
+
+	struct monr			*parent;
+	struct list_head		children;
+	struct list_head		parent_entry;
+	struct pmonr			*pmonrs[PQR_MAX_NR_PKGS];
+};
+
+/*
+ * Root for system-wide hierarchy of monr.
+ * A per-package raw_spin_lock protects changes to the per-pkg elements of
+ * the monr hierarchy.
+ * To modify the monr hierarchy, must hold all locks in each package
+ * using packaged-id as nesting parameter.
+ */
+extern struct monr *monr_hrchy_root;
+
 extern struct pkg_data *cqm_pkgs_data[PQR_MAX_NR_PKGS];
 
 static inline u16 __cqm_pkgs_data_next_online(u16 pkg_id)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5eb7dea..bf29258 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -120,7 +120,7 @@ struct hw_perf_event {
 		};
 #ifdef CONFIG_INTEL_RDT
 		struct { /* intel_cqm */
-			u32                     cqm_rmid;
+			void			*cqm_monr;
 			struct list_head	cqm_event_group_entry;
 			struct list_head	cqm_event_groups_entry;
 		};
-- 
2.8.0.rc3.226.g39d4020

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ