linux-kernel - Perf event operation with hotplug cpus and cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <550C70AF.8020304@redhat.com>
Date:	Fri, 20 Mar 2015 15:10:39 -0400
From:	William Cohen <wcohen@...hat.com>
To:	a.p.zijlstra@...llo.nl, paulus@...ba.org,
	Don Domingo <ddomingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Perf event operation with hotplug cpus and cgroups

The current perf event interface avoids complexity in the kernel by
making the user-space responsible for opening a file descriptor for
each cpu to monitor performance events.  However, there are two use
cases where this approach has issues: handling system-wide
measurements with hotplug cpus and monitoring of cgroups.

hotplug cpus

hotplug cpus can dynamically change the number of cpus that are active
on the system.  If "perf stat -a ..." is started with some of the
processors offline and then additional processors are put online after
perf is started no data is gathered from those newly onlined
processors.

cgroup monitoring

The cgroup monitoring is built on the perf event per cpu monitoring.
If the cgroup is not pinned to a particular set of processors, then
systemwide monitoring for that cgroup needs to be done and a perf
event open is needed for every cpu in the system.  The issue with this
approach is if the cgroups are used for virtual machine guests where
each cgroup is allocated a single processor, the number of cgroups is
proportional to the number of processors in the machine.  The number
of files that need to be opened to monitor the cgroups on the system
is O(cpus^2).  For a large system with 80 cpus that would be 6400
files, much larger than the default ulimit settings and there are huge
number of syscalls to read out information.  If one limits the number
of files opened for performance monitoring by pinning cgroups to
particular processors, any changes in pinning of cgroups to processors
will make the measurement incorrect.

Given the issues with these uses cases is user-space setting up the
counters for each cpu in the system the best solution?  Would it be
better to to allow the system-wide data collection to selected with
one perf event open with pid==-1 and cpu==-1?  Is setup of per cpu
monitoring and aggregation of the counters across processors too
difficult to do in the kernel?

-Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/