[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1130245502.6977.1614289590089.JavaMail.zimbra@efficios.com>
Date: Thu, 25 Feb 2021 16:46:30 -0500 (EST)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: rostedt <rostedt@...dmis.org>
Cc: Michael Jeanson <mjeanson@...icios.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Alexei Starovoitov <ast@...nel.org>,
Yonghong Song <yhs@...com>, paulmck <paulmck@...nel.org>,
Ingo Molnar <mingo@...hat.com>, acme <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
"Joel Fernandes, Google" <joel@...lfernandes.org>,
bpf <bpf@...r.kernel.org>
Subject: Re: [RFC PATCH 0/6] [RFC] Faultable tracepoints (v2)
----- On Feb 24, 2021, at 1:14 PM, rostedt rostedt@...dmis.org wrote:
> On Wed, 24 Feb 2021 11:59:35 -0500 (EST)
> Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>>
>> As a prototype solution, what I've done currently is to copy the user-space
>> data into a kmalloc'd buffer in a preparation step before disabling preemption
>> and copying data over into the per-cpu buffers. It works, but I think we should
>> be able to do it without the needless copy.
>>
>> What I have in mind as an efficient solution (not implemented yet) for the LTTng
>> kernel tracer goes as follows:
>>
>> #define COMMIT_LOCAL 0
>> #define COMMIT_REMOTE 1
>>
>> - faultable probe is called from system call tracepoint [
>> preemption/blocking/migration is allowed ]
>> - probe code calculate the length which needs to be reserved to store the event
>> (e.g. user strlen),
>>
>> - preempt disable -> [ preemption/blocking/migration is not allowed from here ]
>> - reserve_cpu = smp_processor_id()
>> - reserve space in the ring buffer for reserve_cpu
>> [ from that point on, we have _exclusive_ access to write into the ring buffer
>> "slot"
>> from any cpu until we commit. ]
>> - preempt enable -> [ preemption/blocking/migration is allowed from here ]
>>
>
> So basically the commit position here doesn't move until this task is
> scheduled back in and the commit (remote or local) is updated.
Indeed.
> To put it in terms of the ftrace ring buffer, where we have both a commit
> page and a commit index, and it only gets moved by the first one to start a
> commit stack (that is, interrupts that interrupted a write will not
> increment the commit).
The tricky part for ftrace is its reliance on the fact that the concurrent
users of the per-cpu ring buffer are all nested contexts. LTTng does not
assume that and has been designed to be used both in kernel and user-space:
lttng-modules and lttng-ust share a lot of ring buffer code. Therefore,
LTTng's ring buffer supports preemption/migration of concurrent contexts.
The fact that LTTng uses local-atomic-ops on its kernel ring buffers is just
an optimization on an overall ring buffer design meant to allow preemption.
> Now, I'm not sure how LTTng does it, but I could see issues for ftrace to
> try to move the commit pointer (the pointer to the new commit page), as the
> design is currently dependent on the fact that it can't happen while
> commits are taken place.
Indeed, what makes it easy for LTTng is because the ring buffer has been
designed to support preemption/migration from the ground up.
> Are the pages of the LTTng indexed by an array of pages?
Yes, they are. Handling the initial page allocation and then the tracer copy of data
to/from the ring buffer pages is the responsibility of the LTTng lib ring buffer "backend".
The LTTng lib ring buffer backend is somewhat similar to a page table done in software, where
the top level of the page table can be dynamically updated when doing flight recorder tracing.
It is however completely separate from the space reservation/commit scheme which is handled
by the lib ring buffer "frontend".
The algorithm I described in my prior email is specifically targeted at the frontend layer,
leaving the "backend" unchanged.
For some reasons I suspect Ftrace ring buffer combined those two layers into a single
algorithm, which may have its advantages, but seems to strengthen its dependency on
only having nested contexts sharing a given per-cpu ring buffer.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists