linux-kernel - Re: [PATCH V6 11/14] perf/x86/intel: Disable sample-read the slots and metrics events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1bfa475e-2f7c-31e0-5114-34ac9808e4d6@linux.intel.com>
Date:   Wed, 22 Jul 2020 15:26:44 -0400
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     acme@...hat.com, mingo@...nel.org, linux-kernel@...r.kernel.org,
        jolsa@...nel.org, eranian@...gle.com,
        alexander.shishkin@...ux.intel.com, ak@...ux.intel.com
Subject: Re: [PATCH V6 11/14] perf/x86/intel: Disable sample-read the slots
 and metrics events



On 7/21/2020 9:10 AM, Peter Zijlstra wrote:
> On Fri, Jul 17, 2020 at 07:05:51AM -0700, kan.liang@...ux.intel.com wrote:
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> Users fail to sample-read the slots and metrics events, e.g.,
>> perf record -e '{slots, topdown-retiring}:S'.
>>
>> When reading the metrics event, the fixed counter 3 (slots) has to be
>> reset, which impacts the sampling of the slots event.
>>
>> Add a specific validate_group() support to reject the case and error out
>> for Ice Lake.
>>
>> An alternative fix may unconditionally disable slots sampling, but it's
>> not a decent fix. Users may want to only sample the slot events
>> without the topdown metrics events.
>>
>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
> 
> I'm confused by this; it doesn't make sense.
> 
> Should not patch 7 have something like the below instead?
> 
> Also, I think there is a bug when we create a group like this and then
> kill the leader, in that case the core code will 'promote' the sibling
> metric events to their own individual events, see perf_group_detach().
> 
> We need additional code to move those events into unrecoverable ERROR
> state. A new group_caps flag could indicate this promotion isn't
> allowed.

Yes, I can produce the bug with a custom program. The below patch can 
fix it. I will add it into the upcoming V7 patch set.

 From c90d4c19b24392e6bd9ff3c6d23d0d300bfa774f Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@...ux.intel.com>
Date: Wed, 22 Jul 2020 12:06:32 -0700
Subject: [PATCH] perf/core: Add a new PERF_EV_CAP_COEXIST event capability

Current perf assumes that group events are independent. Close an event
doesn't impact the value of the other events in the same group. If the
closed event is a member, after the event closure, other events are
running like a group. If the closed event is a leader, after the
closure, other events are running as singleton events.

However, the assumption is not correct anymore, e.g., the TopDown slots
and metrics events. The slots and metrics events must coexist in the
same group. If the slots event is closed, the value for the metrics
events is invalid.

Add a new PERF_EV_CAP_COEXIST event capability to indicate the
relationship among group events.

If any event with the flag is detached from the group, split the group
into singleton events, and move the events, which also have the flag,
to the unrecoverable ERROR state.

The leader of a PERF_EV_CAP_COEXIST group has to be updated at last.
Move perf_event__header_size(leader); to the end of perf_group_detach().

Suggested-by: Peter Zijlstra <peterz@...radead.org>
Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
---
  arch/x86/events/intel/core.c |  2 ++
  include/linux/perf_event.h   |  5 ++++
  kernel/events/core.c         | 52 +++++++++++++++++++++++++++++++++---
  3 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 35694f6337c9..b0ab638e48ee 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3659,6 +3659,8 @@ static int intel_pmu_hw_config(struct perf_event 
*event)
  				return -EINVAL;

  			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
+
+			event->event_caps |= PERF_EV_CAP_COEXIST;
  		}
  	}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a26f3c9589b7..aa60a1381aa1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -585,9 +585,14 @@ typedef void (*perf_overflow_handler_t)(struct 
perf_event *,
   * PERF_EV_CAP_SOFTWARE: Is a software event.
   * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can 
be read
   * from any CPU in the package where it is active.
+ * PERF_EV_CAP_COEXIST: An event with this flag must coexist with other 
sibling
+ * events, which have the same flag. If any event with the flag is detached
+ * from the group, split the group into singleton events, and move the 
events
+ * with the flag to the unrecoverable ERROR state.
   */
  #define PERF_EV_CAP_SOFTWARE		BIT(0)
  #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_COEXIST		BIT(2)

  #define SWEVENT_HLIST_BITS		8
  #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c436d705fbd..e35d549a356d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2133,10 +2133,28 @@ static inline struct list_head 
*get_event_list(struct perf_event *event)
  	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
  }

+/*
+ * If the event has PERF_EV_CAP_COEXIST capability,
+ * schedule it out and move it into the ERROR state.
+ */
+static inline void perf_remove_coexist_events(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+	if (!(event->event_caps & PERF_EV_CAP_COEXIST))
+		return;
+
+	event_sched_out(event, cpuctx, ctx);
+	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+}
+
  static void perf_group_detach(struct perf_event *event)
  {
+	struct perf_event *leader = event->group_leader;
  	struct perf_event *sibling, *tmp;
  	struct perf_event_context *ctx = event->ctx;
+	bool cap_coexist = !!(event->event_caps & PERF_EV_CAP_COEXIST);

  	lockdep_assert_held(&ctx->lock);

@@ -2150,15 +2168,25 @@ static void perf_group_detach(struct perf_event 
*event)

  	perf_put_aux_event(event);

+	/*
+	 * If a PERF_EV_CAP_COEXIST event is detached,
+	 * split the group into singleton events.
+	 */
+	if (cap_coexist) {
+		event = leader;
+		goto split_group;
+	}
+
  	/*
  	 * If this is a sibling, remove it from its group.
  	 */
-	if (event->group_leader != event) {
+	if (leader != event) {
  		list_del_init(&event->sibling_list);
  		event->group_leader->nr_siblings--;
  		goto out;
  	}

+split_group:
  	/*
  	 * If this was a group event with sibling events then
  	 * upgrade the siblings to singleton events by adding them
@@ -2172,6 +2200,10 @@ static void perf_group_detach(struct perf_event 
*event)
  		/* Inherit group flags from the previous leader */
  		sibling->group_caps = event->group_caps;

+		/* Remove sibling PERF_EV_CAP_COEXIST event */
+		if (cap_coexist)
+			perf_remove_coexist_events(sibling);
+
  		if (!RB_EMPTY_NODE(&event->group_node)) {
  			add_event_to_groups(sibling, event->ctx);

@@ -2181,12 +2213,24 @@ static void perf_group_detach(struct perf_event 
*event)

  		WARN_ON_ONCE(sibling->ctx != event->ctx);
  	}
-
  out:
-	perf_event__header_size(event->group_leader);

-	for_each_sibling_event(tmp, event->group_leader)
+	for_each_sibling_event(tmp, leader)
  		perf_event__header_size(tmp);
+
+	/*
+	 * Change the leader of a PERF_EV_CAP_COEXIST group into
+	 * a singleton event. If the leader is a PERF_EV_CAP_COEXIST
+	 * event as well, remove it.
+	 */
+
+	if (cap_coexist) {
+		list_del_init(&leader->sibling_list);
+		leader->group_leader->nr_siblings = 0;
+		perf_remove_coexist_events(leader);
+	}
+
+	perf_event__header_size(leader);
  }

  static bool is_orphaned_event(struct perf_event *event)
-- 
2.17.1


> 
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3441,8 +3441,22 @@ static int intel_pmu_hw_config(struct pe
>   	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
>   	 */
>   	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
> -		if (is_metric_event(event) && is_sampling_event(event))
> -			return -EINVAL;
> +
> +		if (is_metric_event(event)) {
> +			struct perf_event *leader = event->group_leader;
> +
> +			if (is_sampling_event(event))
> +				return -EINVAL;
> +
> +			if (leader == event)
> +				return -EINVAL;
> +
> +			if (!is_slots_event(leader))
> +				return -EINVAL;
> +
> +			if (is_sampling_event(leader))
> +				return -EINVAL;
> +		}

The leader doesn't need to be a slots event, e.g., {cycles, slots, 
topdown_fe_bound} is also a valid group.
But the slots event must be before all metric events.

So I will also use the below codes to replace the above one in V7.


+		if (is_metric_event(event)) {
+			struct perf_event *leader = event->group_leader;
+			struct perf_event *sibling;
+
+			/* The metric events don't support sampling. */
+			if (is_sampling_event(event))
+				return -EINVAL;
+
+			/* The metric events cannot be a group leader. */
+			if (leader == event)
+				return -EINVAL;
+
+			/*
+			 * The slots event cannot be the leader of a topdown
+			 * sample-read group, e.g., {slots, topdown-retiring}:S
+			 */
+			if (is_slots_event(leader) && is_sampling_event(leader))
+				return -EINVAL;
+
+			/*
+			 * The slots event must be before the metric events,
+			 * because we only update the values of a topdown
+			 * group once with the slots event.
+			 */
+			if (!is_slots_event(leader)) {
+				for_each_sibling_event(sibling, leader) {
+					if (is_slots_event(sibling))
+						break;
+					if (is_metric_event(sibling))
+						return -EINVAL;
+				}
+			}
+		}



Thanks,
Kan