[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1604291405560.18257@vshiva-Udesk>
Date: Fri, 29 Apr 2016 14:06:24 -0700 (PDT)
From: Vikas Shivappa <vikas.shivappa@...ux.intel.com>
To: David Carrillo-Cisneros <davidcc@...gle.com>
cc: Peter Zijlstra <peterz@...radead.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
Matt Fleming <matt.fleming@...el.com>,
Tony Luck <tony.luck@...el.com>,
Stephane Eranian <eranian@...gle.com>,
Paul Turner <pjt@...gle.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/32] 2nd Iteration of Cache QoS Monitoring support.
On Thu, 28 Apr 2016, David Carrillo-Cisneros wrote:
> This series introduces the next iteration of kernel support for the
> Cache QoS Monitoring (CQM) technology available in Intel Xeon processors.
Wondering what is the kernel version this compiles on ?
Thanks,
Vikas
>
> One of the main limitations of the previous version is the inability
> to simultaneously monitor:
> 1) cpu event and any other event in that cpu.
> 2) cgroup events for cgroups in same descendancy line.
> 3) cgroup events and any thread event of a cgroup in the same
> descendancy line.
>
> Another limitation is that monitoring for a cgroup was enabled/disabled by
> the existence of a perf event for that cgroup. Since the event
> llc_occupancy measures changes in occupancy rather than total occupancy,
> in order to read meaningful llc_occupancy values, an event should be
> enabled for a long enough period of time. The overhead in context switches
> caused by the perf events is undesired in some sensitive scenarios.
>
> This series of patches addresses the shortcomings mentioned above and,
> add some other improvements. The main changes are:
> - No more potential conflicts between different events. New
> version builds a hierarchy of RMIDs that captures the dependency
> between monitored cgroups. llc_occupancy for cgroup is the sum of
> llc_occupancies for that cgroup RMID and all other RMIDs in the
> cgroups subtree (both monitored cgroups and threads).
>
> - A cgroup integration that allows to monitor the a cgroup without
> creating a perf event, decreasing the context switch overhead.
> Monitoring is controlled by a boolean cgroup subsystem attribute
> in each perf cgroup, this is:
>
> echo 1 > cgroup_path/perf_event.cqm_cont_monitoring
>
> starts CQM monitoring whether or not there is a perf_event
> attached to the cgroup. Setting the attribute to 0 makes
> monitoring dependent on the existence of a perf_event.
> A perf_event is always required in order to read llc_occupancy.
> This cgroup integration uses Intel's PQR code and is intended to
> be used by upcoming versions of Intel's CAT.
>
> - A more stable rotation algorithm: New algorithm uses SLOs that
> guarantee:
> - A minimum of enabled time for monitored cgroups and
> threads.
> - A maximum time disabled before error is introduced by
> reusing dirty RMIDs.
> - A minimum rate at which RMIDs recycling must progress.
>
> - Reduced impact of stealing/rotation of RMIDs: The new algorithm
> accounts the residual occupancy held by limbo RMIDs towards the
> former owner of the limbo RMID, decreasing the error introduced
> by RMID rotation.
> It also allows a limbo RMID to be reused by its former owner when
> appropriate, decreasing the potential error of reusing dirty RMIDs
> and allowing to make progress even if most limbo RMIDs do not
> drop occupancy fast enough.
>
> - Elimination of pmu::count: perf generic's perf_event_count()
> perform a quick add of atomic types. The introduction of
> pmu::count in the previous CQM series to read occupancy for thread
> events changed the behavior of perf_event_count() by performing a
> potentially slow IPI and write/read to MSR. It also made pmu::read
> to have different behaviors depending on whether the event was a
> cpu/cgroup event or a thread. This patches serie removes the custom
> pmu::count from CQM and provides a consistent behavior for all
> calls of perf_event_read .
>
> - Added error return for pmu::read: Reads to CQM events may fail
> due to stealing of RMIDs, even after successfully adding an event
> to a PMU. This patch series expands pmu::read with an int return
> value and propagates the error to callers that can fail
> (ie. perf_read).
> The ability to fail of pmu::read is consistent with the recent
> changes that allow perf_event_read to fail for transactional
> reading of event groups.
>
> - Introduces the field pmu_event_flags that contain flags set by
> the PMU to signal variations on the default behavior to perf's
> generic code. In this series, three flags are introduced:
> - PERF_CGROUP_NO_RECURSION : Signals generic code to add
> events of the cgroup ancestors of a cgroup.
> - PERF_INACTIVE_CPU_READ_PKG: Signals generic coda that
> this CPU event can be read in any CPU in its event::cpu's
> package, even if the event is not active.
> - PERF_INACTIVE_EV_READ_ANY_CPU: Signals generic code that
> this event can be read in any CPU in any package in the
> system even if the event is not active.
> Using the above flags takes advantage of the CQM's hw ability to
> read llc_occupancy even when the associated perf event is not
> running in a CPU.
>
> This patch series also updates the perf tool to fix error handling and to
> better handle the idiosyncrasies of snapshot and per-pkg events.
>
> David Carrillo-Cisneros (31):
> perf/x86/intel/cqm: temporarily remove MBM from CQM and cleanup
> perf/x86/intel/cqm: remove check for conflicting events
> perf/x86/intel/cqm: remove all code for rotation of RMIDs
> perf/x86/intel/cqm: make read of RMIDs per package (Temporal)
> perf/core: remove unused pmu->count
> x86/intel,cqm: add CONFIG_INTEL_RDT configuration flag and refactor
> PQR
> perf/x86/intel/cqm: separate CQM PMU's attributes from x86 PMU
> perf/x86/intel/cqm: prepare for next patches
> perf/x86/intel/cqm: add per-package RMIDs, data and locks
> perf/x86/intel/cqm: basic RMID hierarchy with per package rmids
> perf/x86/intel/cqm: (I)state and limbo prmids
> perf/x86/intel/cqm: add per-package RMID rotation
> perf/x86/intel/cqm: add polled update of RMID's llc_occupancy
> perf/x86/intel/cqm: add preallocation of anodes
> perf/core: add hooks to expose architecture specific features in
> perf_cgroup
> perf/x86/intel/cqm: add cgroup support
> perf/core: adding pmu::event_terminate
> perf/x86/intel/cqm: use pmu::event_terminate
> perf/core: introduce PMU event flag PERF_CGROUP_NO_RECURSION
> x86/intel/cqm: use PERF_CGROUP_NO_RECURSION in CQM
> perf/x86/intel/cqm: handle inherit event and inherit_stat flag
> perf/x86/intel/cqm: introduce read_subtree
> perf/core: introduce PERF_INACTIVE_*_READ_* flags
> perf/x86/intel/cqm: use PERF_INACTIVE_*_READ_* flags in CQM
> sched: introduce the finish_arch_pre_lock_switch() scheduler hook
> perf/x86/intel/cqm: integrate CQM cgroups with scheduler
> perf/core: add perf_event cgroup hooks for subsystem attributes
> perf/x86/intel/cqm: add CQM attributes to perf_event cgroup
> perf,perf/x86,perf/powerpc,perf/arm,perf/*: add int error return to
> pmu::read
> perf,perf/x86: add hook perf_event_arch_exec
> perf/stat: revamp error handling for snapshot and per_pkg events
>
> Stephane Eranian (1):
> perf/stat: fix bug in handling events in error state
>
> arch/alpha/kernel/perf_event.c | 3 +-
> arch/arc/kernel/perf_event.c | 3 +-
> arch/arm64/include/asm/hw_breakpoint.h | 2 +-
> arch/arm64/kernel/hw_breakpoint.c | 3 +-
> arch/metag/kernel/perf/perf_event.c | 5 +-
> arch/mips/kernel/perf_event_mipsxx.c | 3 +-
> arch/powerpc/include/asm/hw_breakpoint.h | 2 +-
> arch/powerpc/kernel/hw_breakpoint.c | 3 +-
> arch/powerpc/perf/core-book3s.c | 11 +-
> arch/powerpc/perf/core-fsl-emb.c | 5 +-
> arch/powerpc/perf/hv-24x7.c | 5 +-
> arch/powerpc/perf/hv-gpci.c | 3 +-
> arch/s390/kernel/perf_cpum_cf.c | 5 +-
> arch/s390/kernel/perf_cpum_sf.c | 3 +-
> arch/sh/include/asm/hw_breakpoint.h | 2 +-
> arch/sh/kernel/hw_breakpoint.c | 3 +-
> arch/sparc/kernel/perf_event.c | 2 +-
> arch/tile/kernel/perf_event.c | 3 +-
> arch/x86/Kconfig | 6 +
> arch/x86/events/amd/ibs.c | 2 +-
> arch/x86/events/amd/iommu.c | 5 +-
> arch/x86/events/amd/uncore.c | 3 +-
> arch/x86/events/core.c | 3 +-
> arch/x86/events/intel/Makefile | 3 +-
> arch/x86/events/intel/bts.c | 3 +-
> arch/x86/events/intel/cqm.c | 3847 +++++++++++++++++++++---------
> arch/x86/events/intel/cqm.h | 519 ++++
> arch/x86/events/intel/cstate.c | 3 +-
> arch/x86/events/intel/pt.c | 3 +-
> arch/x86/events/intel/rapl.c | 3 +-
> arch/x86/events/intel/uncore.c | 3 +-
> arch/x86/events/intel/uncore.h | 2 +-
> arch/x86/events/msr.c | 3 +-
> arch/x86/include/asm/hw_breakpoint.h | 2 +-
> arch/x86/include/asm/perf_event.h | 41 +
> arch/x86/include/asm/pqr_common.h | 74 +
> arch/x86/include/asm/processor.h | 4 +
> arch/x86/kernel/cpu/Makefile | 4 +
> arch/x86/kernel/cpu/pqr_common.c | 43 +
> arch/x86/kernel/hw_breakpoint.c | 3 +-
> arch/x86/kvm/pmu.h | 10 +-
> drivers/bus/arm-cci.c | 3 +-
> drivers/bus/arm-ccn.c | 3 +-
> drivers/perf/arm_pmu.c | 3 +-
> include/linux/perf_event.h | 91 +-
> kernel/events/core.c | 170 +-
> kernel/sched/core.c | 1 +
> kernel/sched/sched.h | 3 +
> kernel/trace/bpf_trace.c | 5 +-
> tools/perf/builtin-stat.c | 43 +-
> tools/perf/util/counts.h | 19 +
> tools/perf/util/evsel.c | 44 +-
> tools/perf/util/evsel.h | 8 +-
> tools/perf/util/stat.c | 35 +-
> 54 files changed, 3746 insertions(+), 1337 deletions(-)
> create mode 100644 arch/x86/events/intel/cqm.h
> create mode 100644 arch/x86/include/asm/pqr_common.h
> create mode 100644 arch/x86/kernel/cpu/pqr_common.c
>
> --
> 2.8.0.rc3.226.g39d4020
>
>
Powered by blists - more mailing lists