linux-kernel - Re: [PATCH 1/2] perf/core: Share an event with multiple cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM9d7chtYw0v49Q5ue6B=D_8kV6ZyMvT7p10_jxsHMc+H309tA@mail.gmail.com>
Date:   Wed, 31 Mar 2021 00:11:24 +0900
From:   Namhyung Kim <namhyung@...nel.org>
To:     Song Liu <songliubraving@...com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Jiri Olsa <jolsa@...hat.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Ian Rogers <irogers@...gle.com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH 1/2] perf/core: Share an event with multiple cgroups

On Tue, Mar 30, 2021 at 3:33 PM Song Liu <songliubraving@...com> wrote:
> > On Mar 29, 2021, at 4:33 AM, Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Mon, Mar 29, 2021 at 2:17 AM Song Liu <songliubraving@...com> wrote:
> >>> On Mar 23, 2021, at 9:21 AM, Namhyung Kim <namhyung@...nel.org> wrote:
> >>>
> >>> As we can run many jobs (in container) on a big machine, we want to
> >>> measure each job's performance during the run.  To do that, the
> >>> perf_event can be associated to a cgroup to measure it only.
> >>>
>
> [...]
>
> >>> +     return 0;
> >>> +}
> >>
> >> Could you please explain why we need this logic in can_attach?
> >
> > IIUC the ss->attach() is called after a task's cgroup membership
> > is changed.  But we want to collect the performance numbers for
> > the old cgroup just before the change.  As the logic merely checks
> > the current task's cgroup, it should be done in the can_attach()
> > which is called before the cgroup change.
>
> Thanks for the explanations.
>
> Overall, I really like the core idea, especially that the overhead on
> context switch is bounded (by the depth of cgroup tree).

Thanks!

>
> Is it possible to make PERF_EVENT_IOC_ATTACH_CGROUP more flexible?
> Specifically, if we can have
>
>   PERF_EVENT_IOC_ADD_CGROUP     add a cgroup to the list
>   PERF_EVENT_IOC_EL_CGROUP      delete a cgroup from the list
>
> we can probably share these events among multiple processes, and
> these processes don't need to know others' cgroup list. I think
> this will be useful for users to build customized monitoring in
> its own container.
>
> Does this make sense?

Maybe we can add ADD/DEL interface for more flexible monitoring
but I'm not sure which use cases it'll be used actually.

For your multi-process sharing case, the original events' file
descriptors should be shared first.  Also adding and deleting
(or just reading) arbitrary cgroups from a container can be a
security concern IMHO.

So I just focused on the single-process multi-cgroup case which is
already used (perf stat --for-each-cgroup) and very important in my
company's setup.  In this case we have a list of interested cgroups
from the beginning so it's more efficient to create a properly sized
hash table and all the nodes at once.

Thanks,
Namhyung