[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200812075135.GA191218@gmail.com>
Date: Wed, 12 Aug 2020 00:51:35 -0700
From: Andrei Vagin <avagin@...il.com>
To: Eugene Lubarsky <elubarsky.linux@...il.com>
Cc: linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, adobriyan@...il.com,
dsahern@...il.com
Subject: Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all
processes
On Tue, Aug 11, 2020 at 12:58:47AM +1000, Eugene Lubarsky wrote:
> This is an idea for substantially reducing the number of syscalls needed
> by monitoring tools whilst mostly re-using the existing API.
>
> The proposed files in this proof-of-concept patch set are:
>
> * /proc/all/stat
> A stat line for each process in the existing format.
>
> * /proc/all/statm
> statm lines but starting with a PID column.
>
> * /proc/all/status
> status info for all processes in the existing format.
>
> * /proc/all/io
> The existing /proc/pid/io data but formatted as a single line for
> each process, similarly to stat/statm, with a PID column added.
>
> * /proc/all/statx
> Gathers info from stat, statm and io; the purpose is actually
> not so much to reduce syscalls but to help userspace be more
> efficient by not having to store data in e.g. hashtables in order
> to gather it from separate /proc/all/ files.
>
> The format proposed here starts with the unchanged stat line
> and begins the other info with a few characters, repeating for
> each process:
>
> ...
> 25 (cat) R 1 1 0 0 -1 4194304 185 0 16 0 2 0 0 0 20 ...
> m 662 188 167 5 0 112 0
> io 4292 0 12 0 0 0 0
> ...
>
>
> There has been a proposal with some overlapping goals: /proc/task-diag
> (https://github.com/avagin/linux-task-diag), but I'm not sure about
> its current status.
I rebased the task_diag patches on top of v5.8:
https://github.com/avagin/linux-task-diag/tree/v5.8-task-diag
/proc/pid files have three major limitations:
* Requires at least three syscalls per process per file
open(), read(), close()
* Variety of formats, mostly text based
The kernel spent time to encode binary data into a text format and
then tools like top and ps spent time to decode them back to a binary
format.
* Sometimes slow due to extra attributes
For example, /proc/PID/smaps contains a lot of useful informations
about memory mappings and memory consumption for each of them. But
even if we don't need memory consumption fields, the kernel will
spend time to collect this information.
More details and numbers are in this article:
https://avagin.github.io/how-fast-is-procfs
This new interface doesn't have only one of these limitations, but
task_diag doesn't have all of them.
And I compared how fast each of these interfaces:
The test environment:
CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
RAM: 16GB
kernel: v5.8 with task_diag and /proc/all patches.
100K processes:
$ ps ax | wc -l
10228
$ time cat /proc/all/status > /dev/null
real 0m0.577s
user 0m0.017s
sys 0m0.559s
task_proc_all is used to read /proc/pid/status for all tasks:
https://github.com/avagin/linux-task-diag/blob/master/tools/testing/selftests/task_diag/task_proc_all.c
$ time ./task_proc_all status
tasks: 100230
real 0m0.924s
user 0m0.054s
sys 0m0.858s
/proc/all/status is about 40% faster than /proc/*/status.
Now let's take a look at the perf output:
$ time perf record -g cat /proc/all/status > /dev/null
$ perf report
- 98.08% 1.38% cat [kernel.vmlinux] [k] entry_SYSCALL_64
- 96.70% entry_SYSCALL_64
- do_syscall_64
- 94.97% ksys_read
- 94.80% vfs_read
- 94.58% proc_reg_read
- seq_read
- 87.95% proc_pid_status
+ 13.10% seq_put_decimal_ull_width
- 11.69% task_mem
+ 9.48% seq_put_decimal_ull_width
+ 10.63% seq_printf
- 10.35% cpuset_task_status_allowed
+ seq_printf
- 9.84% render_sigset_t
1.61% seq_putc
+ 1.61% seq_puts
+ 4.99% proc_task_name
+ 4.11% seq_puts
- 3.76% render_cap_t
2.38% seq_put_hex_ll
+ 1.25% seq_puts
2.64% __task_pid_nr_ns
+ 1.54% get_task_mm
+ 1.34% __lock_task_sighand
+ 0.70% from_kuid_munged
0.61% get_task_cred
0.56% seq_putc
0.52% hugetlb_report_usage
0.52% from_kgid_munged
+ 4.30% proc_all_next
+ 0.82% _copy_to_user
We can see that the kernel spent more than 50% of the time to encode binary
data into a text format.
Now let's see how fast task_diag:
$ time ./task_diag_all all -c -q
real 0m0.087s
user 0m0.001s
sys 0m0.082s
Maybe we need resurrect the task_diag series instead of inventing
another less-effective interface...
Thanks,
Andrei
>
>
>
> Best Wishes,
>
> Eugene
>
>
> Eugene Lubarsky (5):
> fs/proc: Introduce /proc/all/stat
> fs/proc: Introduce /proc/all/statm
> fs/proc: Introduce /proc/all/status
> fs/proc: Introduce /proc/all/io
> fs/proc: Introduce /proc/all/statx
>
> fs/proc/base.c | 215 +++++++++++++++++++++++++++++++++++++++++++--
> fs/proc/internal.h | 1 +
> fs/proc/root.c | 1 +
> 3 files changed, 210 insertions(+), 7 deletions(-)
>
> --
> 2.25.1
>
Powered by blists - more mailing lists