linux-kernel - Re: [PATCH v4 0/3] perf: add support for analyzing events for containers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161229014138.GB2341@templeofstupid.com>
Date:   Wed, 28 Dec 2016 17:41:38 -0800
From:   Krister Johansen <kjlx@...pleofstupid.com>
To:     Hari Bathini <hbathini@...ux.vnet.ibm.com>
Cc:     ast@...com, peterz@...radead.org,
        lkml <linux-kernel@...r.kernel.org>, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, mingo@...hat.com,
        daniel@...earbox.net, rostedt@...dmis.org,
        Ananth N Mavinakayanahalli <ananth@...ux.vnet.ibm.com>,
        ebiederm@...ssion.com, sargun@...gun.me,
        Aravinda Prasad <aravinda@...ux.vnet.ibm.com>,
        brendan.d.gregg@...il.com
Subject: Re: [PATCH v4 0/3] perf: add support for analyzing events for
 containers

On Fri, Dec 16, 2016 at 12:06:55AM +0530, Hari Bathini wrote:
> This patch-set overcomes this limitation by using cgroup identifier as
> container unique identifier. A new PERF_RECORD_NAMESPACES event that
> records namespaces related info is introduced, from which the cgroup
> namespace's device & inode numbers are used as cgroup identifier. This
> is based on the assumption that each container is created with it's own
> cgroup namespace allowing assessment/analysis of multiple containers
> using cgroup identifier.

Why choose cgroups when the kernel dispenses namespace-unique
identifiers. Cgroup membership can be arbitrary.  Moreover, cgroup and
namespace destruction are handled by separate subsystems.  It's possible
to have a cgroup notifier run prior to network namespace teardown
occurring.

If it were me, I'd re-use existing convention to identify the namespaces
you want to monitor.  The code in nsenter(1) can take a namespace that's
been bind mount'd on a file, or extract the ns information from a task
in /procfs.

My biggest concern is how the sample data is handled after it has been
collected.  Both namespaces and cgroups don't survive reboots.  Will the
records will contain all the persistent state needed to run a report or
script command at a later date?

Does this code attempt to enter alternate namespaces in order to record
stack/symbol information for a '-g' style trace?  If so, how are you
holding on to that information?  There's no guarantee that a particular
container will be alive or have its filesystems reachable from the host
if the trace data is evaluated at a later time.

-K