[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1557954092-67275-1-git-send-email-kan.liang@linux.intel.com>
Date: Wed, 15 May 2019 14:01:28 -0700
From: kan.liang@...ux.intel.com
To: peterz@...radead.org, tglx@...utronix.de, mingo@...hat.com,
linux-kernel@...r.kernel.org
Cc: eranian@...gle.com, tj@...nel.org, mark.rutland@....com,
irogers@...gle.com, ak@...ux.intel.com,
Kan Liang <kan.liang@...ux.intel.com>
Subject: [PATCH V2 0/4] Optimize cgroup context switch
From: Kan Liang <kan.liang@...ux.intel.com>
Changes since V1:
- Add new event_type to indicate cgroup only switch
Add cgrp_event_type to track event type of a cgroup
Extend ctx_pinned/flexible_sched_in and struct sched_in_data to pass
the event_type
- If the new cgroup has pinned events, schedule out all flexible events
before sched in all events.
- Add macro and helper function to replace duplicated content in patch 1
- Add new RB tree keys, cgrp_id and cgrp_group_index, for cgroup.
Now, cgrp_id is the same as css subsys-unique ID.
- Add per-cpu pinned/flexible_event in perf_cgroup to track the left most
event for a cgroup.
- Add per-cpu rotated_event in perf_cgroup to handle multiplexing.
Disable fast path for multiplexing.
- Support hierarchies
- Update test result. Test with different hierarchy.
On systems with very high context switch rates between cgroups,
there are high overhead using cgroup perf.
Current codes have two issues.
- System-wide events are mistakenly switched in cgroup
context switch. It causes system-wide events miscounting,
and brings avoidable overhead.
Patch 1 fixes the issue.
- The cgroup context switch sched_in is low efficient.
All cgroup events share the same per-cpu pinned/flexible groups.
The RB trees for pinned/flexible groups don't understand cgroup.
Current code has to traverse all events, and use event_filter_match()
to filter the events for specific cgroup.
Patch 2-4 adds a fast path for cgroup context switch sched_in by
training the RB tree to understand cgroup. The extra filtering
can be avoided.
Here is test with 6 child cgroups (sibling cgroups), 1 parent cgroup
and system-wide events.
A specjbb benchmark is running in each child cgroup.
The perf command is as below.
perf stat -e cycles,instructions -e cycles,instructions
-e cycles,instructions -e cycles,instructions
-e cycles,instructions -e cycles,instructions
-e cycles,instructions -e cycles,instructions
-G cgroup1,cgroup1,cgroup2,cgroup2,cgroup3,cgroup3
-G cgroup4,cgroup4,cgroup5,cgroup5,cgroup6,cgroup6
-G cgroup_parent,cgroup_parent
-a -e cycles,instructions -I 1000
The average RT (Response Time) reported from specjbb is
used as key performance metrics. (The lower the better)
RT(us) Overhead
Baseline (no perf stat): 4286.9
Use cgroup perf, no patches: 4537.1 5.84%
Use cgroup perf, apply patch 1: 4440.7 3.59%
Use cgroup perf, apple all patches: 4403.5 2.72%
Kan Liang (4):
perf: Fix system-wide events miscounting during cgroup monitoring
perf: Add filter_match() as a parameter for pinned/flexible_sched_in()
perf cgroup: Add new RB tree keys for cgroup
perf cgroup: Add fast path for cgroup switch
include/linux/perf_event.h | 6 +
kernel/events/core.c | 427 ++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 394 insertions(+), 39 deletions(-)
--
2.7.4
Powered by blists - more mailing lists