[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c6ab8f1-9102-9fd0-a848-04c826a4d957@torproject.org>
Date: Fri, 12 Mar 2021 20:42:06 -0600
From: Jim Newsome <jnewsome@...project.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Oleg Nesterov <oleg@...hat.com>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Christian Brauner <christian@...uner.io>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5] do_wait: make PIDTYPE_PID case O(1) instead of O(n)
Here are the micro-benchmark results. I ended up reworking it to use
google's benchmark tool [1]. For each N I timed how long it took to fork
a new child and then immediately wait on it, while already having N
other children. (Initially I tried to vfork, but that didn't play nicely
with the benchmark tool)
[1] https://github.com/google/benchmark
Without the patch:
./bench_waitpid
2021-03-12T20:03:57-06:00
Running ./bench_waitpid
Run on (4 X 2693.88 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 4096 KiB (x4)
L3 Unified 16384 KiB (x4)
Load Average: 0.48, 0.16, 0.06
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_WaitPid/1 313354 ns 74754 ns 8575
BM_WaitPid/10 305954 ns 72424 ns 7977
BM_WaitPid/100 323126 ns 83368 ns 7861
BM_WaitPid/500 324479 ns 99071 ns 6541
BM_WaitPid/1000 377143 ns 178590 ns 3260
BM_WaitPid/5000 816597 ns 803919 ns 849
BM_WaitPid/8000 1282339 ns 1268179 ns 548
For example, this means that for a process that already has 5000
children, forking and then immediately calling waitpid on 5001st took an
average of 179 microsecs. The O(n) implementation starts getting
noticeably slower with around 100 children.
With the patch:
$ ./bench_waitpid
2021-03-12T20:19:52-06:00
Running ./bench_waitpid
Run on (4 X 2693.88 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 4096 KiB (x4)
L3 Unified 16384 KiB (x4)
Load Average: 1.39, 0.49, 0.18
----------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------
BM_WaitPid/1 305501 ns 74028 ns 9261
BM_WaitPid/10 309644 ns 74916 ns 8838
BM_WaitPid/100 319457 ns 77193 ns 8717
BM_WaitPid/500 306929 ns 73863 ns 8327
BM_WaitPid/1000 310849 ns 74848 ns 8458
BM_WaitPid/5000 329324 ns 79386 ns 9123
BM_WaitPid/8000 317991 ns 77889 ns 7526
As expected, the cost is about the same as the baseline for a small
number of children, and stays about the same as the # of children increase.
Source:
#include <sys/times.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include "benchmark/benchmark.h"
static void BM_WaitPid(benchmark::State& state) {
int num_children = state.range(0);
// Create num_children
std::vector<pid_t> children;
children.reserve(num_children);
for (int i = 0; i < num_children; ++i) {
pid_t forkrv = fork();
if (forkrv < 0) {
perror("fork");
exit(1);
}
if (forkrv == 0) {
// child
exit(0);
}
children.push_back(forkrv);
}
// The body of this loop is what gets timed.
for (auto _ : state) {
pid_t forkrv = fork();
if (forkrv < 0) {
perror("fork");
exit(1);
}
if (forkrv == 0) {
// child
exit(0);
}
if (waitpid(forkrv, NULL, 0) < 0) {
perror("waitpid");
exit(1);
}
}
// Tear down the other children
for (pid_t child : children) {
if (waitpid(child, NULL, 0) < 0) {
perror("waitpid");
exit(1);
}
}
}
// Register the function as a benchmark
BENCHMARK(BM_WaitPid)->Arg(1)->Arg(10)->Arg(100)->Arg(500)->Arg(1000)->Arg(5000)->Arg(8000);
// Run the benchmark
BENCHMARK_MAIN();
Powered by blists - more mailing lists