[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190211074633.GB49295@gmail.com>
Date: Mon, 11 Feb 2019 08:46:33 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Adrian Hunter <adrian.hunter@...el.com>,
Andi Kleen <ak@...ux.intel.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...hat.com>, Song Liu <songliubraving@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] perf, bpf: Retain kernel executable code in memory
to aid Intel PT tracing
* Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Feb 07, 2019 at 01:19:01PM +0200, Adrian Hunter wrote:
> > Subject to memory pressure and other limits, retain executable code, such
> > as JIT-compiled bpf, in memory instead of freeing it immediately it is no
> > longer needed for execution.
> >
> > While perf is primarily aimed at statistical analysis, tools like Intel
> > PT can aim to provide a trace of exactly what happened. As such, corner
> > cases that can be overlooked statistically need to be addressed. For
> > example, there is a gap where JIT-compiled bpf can be freed from memory
> > before a tracer has a chance to read it out through the bpf syscall.
> > While that can be ignored statistically, it contributes to a death by
> > 1000 cuts for tracers attempting to assemble exactly what happened. This is
> > a bit gratuitous given that retaining the executable code is relatively
> > simple, and the amount of memory involved relatively small. The retained
> > executable code is then available in memory images such as /proc/kcore.
> >
> > This facility could perhaps be extended also to init sections.
> >
> > Note that this patch is compile tested only and, at present, is missing
> > the ability to retain symbols.
>
> You don't need the symbols; you already have them through
> PERF_RECORD_KSYMBOL.
>
> Also; afaict this patch guarantees exactly nothing. It registers a
> shrinker which will (given enough memory pressure) happily free your
> text before we get around to copying it out.
>
> Did you read this proposal?
>
> https://lkml.kernel.org/r/20190109101808.GG1900@hirez.programming.kicks-ass.net
>
> (also: s/KCORE_QC/KCORE_QS/ for quiescent state)
>
> That would create an RCU like interface to /proc/kcore and give you the
> guarantees you need, while also allowing the memory to get freed once
> you've obtained a copy.
Yeah, adding a proper change-notification interface to /proc/kcore sounds
like a superior solution to trying to shoehorn this down perf's throat.
It's not like any of this is useful without having opened /proc/kcore.
Also, /proc/kcore is privileged, so the indefinite resource allocation
side effect in case user-space doesn't drain the lists is OK.
Thanks,
Ingo
Powered by blists - more mailing lists