[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210924033547.939554938@goodmis.org>
Date: Thu, 23 Sep 2021 23:35:47 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: linux-kernel@...r.kernel.org
Cc: Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
linux-trace-devel@...r.kernel.org
Subject: [PATCH 0/2] tracing: Have trace_pid_list be a sparse array
When the trace_pid_list was created, the default pid max was 32768.
Creating a bitmask that can hold one bit for all 32768 took up 4096 (one
page). Having a one page bitmask was not much of a problem, and that was
used for mapping pids. But today, systems are bigger and can run more
tasks, and now the default pid_max is usually set to 4194304. Which means
to handle that many pids requires 524288 bytes. Worse yet, the pid_max can
be set to 2^30 (1073741824 or 1G) which would take 134217728 (128M) of
memory to store this array.
Since the pid_list array is very sparsely populated, it is a huge waste of
memory to store all possible bits for each pid when most will not be set.
Instead, use a page table scheme to store the array, and allow this to
handle up to 32 bit pids.
The pid_mask will start out with 1024 entries for the first 10 MSB bits.
This will cost 4K for 32 bit architectures and 8K for 64 bit. Each of
these will have a 1024 array to store the next 10 bits of the pid (another
4 or 8K). These will hold an 512 byte bitmask (which will cover the LSB 12
bits or 4096 bits).
When the trace_pid_list is allocated, it will have the 4/8K upper bits
allocated, and then it will allocate a cache for the next upper chunks and
the lower chunks (default 6 of each). Then when a bit is "set", these
chunks will be pulled from the free list and added to the array. If the
free list gets down to a lever (default 2), it will trigger an irqwork
that will refill the cache back up.
On clearing a bit, if the clear causes the bitmask to be zero, that chunk
will then be placed back into the free cache for later use, keeping the
need to allocate more down to a minimum.
Steven Rostedt (VMware) (2):
tracing: Place trace_pid_list logic into abstract functions
tracing: Create a sparse bitmask for pid filtering
----
kernel/trace/Makefile | 1 +
kernel/trace/ftrace.c | 6 +-
kernel/trace/pid_list.c | 551 ++++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.c | 78 +++----
kernel/trace/trace.h | 14 +-
kernel/trace/trace_events.c | 6 +-
6 files changed, 595 insertions(+), 61 deletions(-)
create mode 100644 kernel/trace/pid_list.c
Powered by blists - more mailing lists